[SPARK-10478][ML] Performance, organization, and style improvements for multi-layer perceptron #8648

feynmanliang · 2015-09-07T21:43:19Z

Changes manual iteration loops into UFuncs, vectorized, and broadcasted operations
Refactors into more Scala idiomatic syntax (e.g. while and for -> foreach and maps, System.arraycopy -> ++)
Fixes various style issues (4-indent method args, wrap one-line ifs in {})
Adds comments and improves scaladocs (grammar, punctuation)

Notes to Reviewers

Using UFuncs simplifies ActivationFunction code significantly at the cost of more complicated implementations for crossEntropy, derivative, etc... I'm not sure if the tradeoff here is worth it
There is a slight performance hit in SoftmaxFunction.eval since we add an additional iteration over each column of x (previously computing exp(x - maxVal) and accumulating a sum were done at the same time whereas now the exp(x - maxVal) computation is done in one loop and the sum is computed after), but I feel that this is acceptable given the significant reduction in complexity. Thoughts?

CC @avulanov @mengxr

SparkQA · 2015-09-07T21:52:07Z

Test build #42107 has finished for PR 8648 at commit 84f8bea.

This patch fails Scala style tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2015-09-08T02:47:31Z

Test build #42113 has finished for PR 8648 at commit 22ba174.

This patch fails Scala style tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2015-09-08T06:15:01Z

Test build #42121 has finished for PR 8648 at commit abdba81.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2015-09-08T17:56:37Z

Test build #42137 has finished for PR 8648 at commit f6731ff.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- case class BlockFetchException(messages: String, throwable: Throwable)

avulanov · 2015-09-08T19:32:42Z

mllib/src/main/scala/org/apache/spark/ml/ann/Layer.scala

Flatten might be expensive for array of large arrays, is not it?

avulanov · 2015-09-08T19:41:41Z

@feynmanliang Thank you for reviewing the code! I made one pass. It seems that UFunc simplifies it a lot. However I am not sure about .flatten and .flatMap on array of large arrays. We need to perform performance comparison. Could you run the benchmark from https://github.com/avulanov/ann-benchmark before and after refactoring to see the difference?

feynmanliang · 2015-09-11T19:08:26Z

@avulanov The benchmarking code is written against a WIP implementation; I sent you a PR for bringing it up to date.

LBFGS is taking significantly long time on my machine:

I've removed the flatten/flatMap changes from this PR and will save them for when I have more time to properly perf test.

SparkQA · 2015-09-11T20:07:22Z

Test build #42348 has finished for PR 8648 at commit f56e2d5.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

avulanov · 2015-09-14T23:40:42Z

@feynmanliang I suggest using native BLAS for testing. It worth checking the impact of using UFunc as well.

mengxr · 2015-09-22T21:28:46Z

mllib/src/main/scala/org/apache/spark/ml/ann/Layer.scala

@feynmanliang Could you run some micro-benchmark on this function? I think this is the only place that might cause performance issues.

@mengxr Local benchmarks here. Performance improves across the board except for (n=100000, k=50)

feynmanliang · 2015-10-17T07:56:58Z

@mengxr added benchmarks, can you make another pass when you have a chance

Feynman Liang added 5 commits September 7, 2015 13:00

Documentation and indentation fixes

611c76f

Refactors unneeded helpers

7b192db

More doc and style fixes

12169d7

Cleans up documentation and uses functional code

bc52b65

Vectorizes linalg using ufuncs and vector ops

84f8bea

Feynman Liang added 2 commits September 7, 2015 19:35

Cleans up typos in BreezeUtil

1f8ef66

Fixes style issues

22ba174

Fixes style errors

abdba81

feynmanliang changed the title ~~[SPARK-10478][ML] Performance, organization, and style improvements for multi-layer perceptron~~ [SPARK-10478][ML][WIP] Performance, organization, and style improvements for multi-layer perceptron Sep 8, 2015

Reverts noops and fixes unit tests

f6731ff

feynmanliang changed the title ~~[SPARK-10478][ML][WIP] Performance, organization, and style improvements for multi-layer perceptron~~ [SPARK-10478][ML] Performance, organization, and style improvements for multi-layer perceptron Sep 8, 2015

avulanov reviewed Sep 8, 2015
View reviewed changes

mllib/src/main/scala/org/apache/spark/ml/ann/Layer.scala Outdated

Copy link

Contributor

avulanov Sep 8, 2015

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Flatten might be expensive for array of large arrays, is not it?

Reverts flatten/flatMap changes

f56e2d5

mengxr reviewed Sep 22, 2015
View reviewed changes

avulanov mentioned this pull request Oct 22, 2015

[SPARK-11262][ML] Unit test for gradient, loss layers, memory management for multilayer perceptron #9229

Closed

asfgit closed this in 66ec249 May 20, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-10478][ML] Performance, organization, and style improvements for multi-layer perceptron #8648

[SPARK-10478][ML] Performance, organization, and style improvements for multi-layer perceptron #8648

Uh oh!

feynmanliang commented Sep 7, 2015

Uh oh!

SparkQA commented Sep 7, 2015

Uh oh!

SparkQA commented Sep 8, 2015

Uh oh!

SparkQA commented Sep 8, 2015

Uh oh!

SparkQA commented Sep 8, 2015

Uh oh!

avulanov Sep 8, 2015

Uh oh!

avulanov commented Sep 8, 2015

Uh oh!

feynmanliang commented Sep 11, 2015

Uh oh!

SparkQA commented Sep 11, 2015

Uh oh!

avulanov commented Sep 14, 2015

Uh oh!

mengxr Sep 22, 2015

Uh oh!

feynmanliang Oct 10, 2015

Uh oh!

feynmanliang commented Oct 17, 2015

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

[SPARK-10478][ML] Performance, organization, and style improvements for multi-layer perceptron #8648

[SPARK-10478][ML] Performance, organization, and style improvements for multi-layer perceptron #8648

Uh oh!

Conversation

feynmanliang commented Sep 7, 2015

Uh oh!

SparkQA commented Sep 7, 2015

Uh oh!

SparkQA commented Sep 8, 2015

Uh oh!

SparkQA commented Sep 8, 2015

Uh oh!

SparkQA commented Sep 8, 2015

Uh oh!

avulanov Sep 8, 2015

Choose a reason for hiding this comment

Uh oh!

avulanov commented Sep 8, 2015

Uh oh!

feynmanliang commented Sep 11, 2015

Uh oh!

SparkQA commented Sep 11, 2015

Uh oh!

avulanov commented Sep 14, 2015

Uh oh!

mengxr Sep 22, 2015

Choose a reason for hiding this comment

Uh oh!

feynmanliang Oct 10, 2015

Choose a reason for hiding this comment

Uh oh!

feynmanliang commented Oct 17, 2015

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants