-
Notifications
You must be signed in to change notification settings - Fork 18.6k
Closed
Description
Hello,
When I ran the mnist demo shipped with official Caffe with 4 K40 gpus, the softmax loss becomes 87.3365 after a few hundred of iterations. But if I run it with one or two gpus, it can get correct outcome. The used Caffe is the latest(f28f5ae).
Does anyone encounter the same problem?
Thanks.
The following is the log when I ran it with 4 gpus:
I0709 10:34:20.722365 16617 solver.cpp:337] Iteration 0, Testing net (#0)
I0709 10:34:24.325662 16617 solver.cpp:404] Test net output #0: accuracy = 0.0761
I0709 10:34:24.325714 16617 solver.cpp:404] Test net output #1: loss = 2.34995 (* 1 = 2.34995 loss)
I0709 10:34:24.384223 16617 solver.cpp:228] Iteration 0, loss = 2.2995
I0709 10:34:24.384263 16617 solver.cpp:244] Train net output #0: loss = 2.2995 (* 1 = 2.2995 loss)
I0709 10:34:24.384330 16617 sgd_solver.cpp:106] Iteration 0, lr = 0.01
I0709 10:34:29.217810 16617 solver.cpp:228] Iteration 100, loss = 0.548973
I0709 10:34:29.217882 16617 solver.cpp:244] Train net output #0: loss = 0.548973 (* 1 = 0.548973 loss)
I0709 10:34:29.242552 16617 sgd_solver.cpp:106] Iteration 100, lr = 0.00992565
I0709 10:34:34.147747 16617 solver.cpp:228] Iteration 200, loss = 0.49398
I0709 10:34:34.147788 16617 solver.cpp:244] Train net output #0: loss = 0.49398 (* 1 = 0.49398 loss)
I0709 10:34:34.161823 16617 sgd_solver.cpp:106] Iteration 200, lr = 0.00985258
I0709 10:34:39.072245 16617 solver.cpp:228] Iteration 300, loss = 0.829506
I0709 10:34:39.072304 16617 solver.cpp:244] Train net output #0: loss = 0.829506 (* 1 = 0.829506 loss)
I0709 10:34:39.072336 16617 sgd_solver.cpp:106] Iteration 300, lr = 0.00978075
I0709 10:34:43.944337 16617 solver.cpp:228] Iteration 400, loss = 0.194765
I0709 10:34:43.944378 16617 solver.cpp:244] Train net output #0: loss = 0.194765 (* 1 = 0.194765 loss)
I0709 10:34:43.944406 16617 sgd_solver.cpp:106] Iteration 400, lr = 0.00971013
I0709 10:34:48.838815 16617 solver.cpp:337] Iteration 500, Testing net (#0)
I0709 10:34:51.885664 16617 solver.cpp:404] Test net output #0: accuracy = 0.1009
I0709 10:34:51.885836 16617 solver.cpp:404] Test net output #1: loss = 87.3365 (* 1 = 87.3365 loss)
I0709 10:34:51.895921 16617 solver.cpp:228] Iteration 500, loss = 87.3365
I0709 10:34:51.895948 16617 solver.cpp:244] Train net output #0: loss = 87.3365 (* 1 = 87.3365 loss)
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels