Skip to content
This repository was archived by the owner on Nov 17, 2023. It is now read-only.
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
24 changes: 17 additions & 7 deletions src/operator/rnn_impl.h
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,7 @@
#include <vector>
#include <string>
#include <utility>
#include <random>
#include "./math.h"
#include "./math_functions-inl.h"
#include "./operator_common.h"
Expand Down Expand Up @@ -149,7 +150,6 @@ void LstmForwardTraining(DType* ws,
const int r_size = D * T * N * H * 6;
const int y_offset = T * N * H * 5;
const int cell_size = N * H;
unsigned int seed_ = 17 + rand() % 4096; // NOLINT(runtime/threadsafe_fn)
int idx = 0; // state & cell state's idx;
const int omp_threads = mxnet::engine::OpenMP::Get()->GetRecommendedOMPThreadCount();
for (int i = 0; i < L; ++i) {
Expand All @@ -176,13 +176,17 @@ void LstmForwardTraining(DType* ws,
if (dropout > 0.0f) {
#pragma omp parallel for num_threads(omp_threads)
for (int j = 0; j < T * N * H * D; j++) {
int rand_data = rand_r(&seed_);
static thread_local std::random_device device;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it correct that we are seeding inside the loop?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting link, thanks. I would still prefer the explicit way for clarity.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it correct that we are seeding inside the loop?

Since it's static we doing it only once.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For every thread.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably addressed at the framework level by providing APIs to get rand numbers. We should not expect developers who implement operators to handle thread local variables

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@eric-haibin-lin In general, I don't see why a thread local variable is an issue - there is a separate generator for every thread and nothing needs to be done additionally to ensure thread safety. Otherwise locking needs to be in place.

I could add a generate random number function to the framework API, but wouldn't change the implementation. What do you think?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And in case of 1 function for all there will be complications with setting the seed if needed as well.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggest to refer to RandGenerator<xpu> as I mentioned below. It handles thread-safe problem without locking. The implement is somewhat tricky though.

static thread_local std::default_random_engine generator(device());
static thread_local std::uniform_int_distribution<int> distribution;
static thread_local auto dice = std::bind(distribution, generator);
int rand_data = dice();
if (static_cast<float>(rand_data % 1000) < static_cast<float>(1000 * dropout)) {
dropout_random[i * T * N * H * D + j] = 0;
y.dptr_[j] = 0;
} else {
dropout_random[i * T * N * H * D + j] = 1.0f - dropout;
y.dptr_[j] = y.dptr_[j] / (1.0f - dropout);
y.dptr_[j] = y.dptr_[j] / (1.0f - dropout);
}
}
}
Expand Down Expand Up @@ -994,7 +998,6 @@ void GruForwardTraining(DType* ws,
DType* bx_l = bx;
DType* bh_l = bh;
DType* y_tmp = x_ptr;
unsigned int seed_ = 17 + rand() % 4096; // NOLINT(runtime/threadsafe_fn)
for (int l = 0; l < L; l++) {
if (l != 0) {
y_tmp = y_l;
Expand All @@ -1004,7 +1007,11 @@ void GruForwardTraining(DType* ws,
const int omp_threads = mxnet::engine::OpenMP::Get()->GetRecommendedOMPThreadCount();
#pragma omp parallel for num_threads(omp_threads)
for (int i = 0; i < T * N * I; i++) {
int rand_data = rand_r(&seed_);
static thread_local std::random_device device;
static thread_local std::default_random_engine generator(device());
static thread_local std::uniform_int_distribution<int> distribution;
static thread_local auto dice = std::bind(distribution, generator);
int rand_data = dice();
if (static_cast<float>(rand_data % 1000) < static_cast<float>(1000 * dropout)) {
dropout_random[(l - 1) * T * N * I + i] = 0;
y_tmp[i] = 0;
Expand Down Expand Up @@ -1881,7 +1888,6 @@ void VanillaRNNForwardTraining(DType* ws,
DType* bh_l = bh;
DType* y_tmp = x_ptr;
const int omp_threads = mxnet::engine::OpenMP::Get()->GetRecommendedOMPThreadCount();
unsigned int seed_ = 17 + rand() % 4096; // NOLINT(runtime/threadsafe_fn)
for (int l = 0; l < L; l++) {
if (l != 0) {
y_tmp = y_l;
Expand All @@ -1890,7 +1896,11 @@ void VanillaRNNForwardTraining(DType* ws,
if (dropout > 0.0f && l > 0) {
#pragma omp parallel for num_threads(omp_threads)
for (int i = 0; i < T * N * I; i++) {
int rand_data = rand_r(&seed_);
static thread_local std::random_device device;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could we wrap this up in a function? Or would interfere with thread_local? Seems we are duplicating code.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure where to put it, do you have any suggestions?

static thread_local std::default_random_engine generator(device());
static thread_local std::uniform_int_distribution<int> distribution;
static thread_local auto dice = std::bind(distribution, generator);
int rand_data = dice();
if (static_cast<float>(rand_data % 1000) < static_cast<float>(1000 * dropout)) {
dropout_random[(l - 1) * T * N * I + i] = 0;
y_tmp[i] = 0;
Expand Down
12 changes: 8 additions & 4 deletions tests/cpp/engine/threaded_engine_test.cc
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,7 @@
#include <thread>
#include <chrono>
#include <vector>
#include <random>

#include "../src/engine/engine_impl.h"
#include "../include/test_util.h"
Expand All @@ -58,17 +59,20 @@ void GenerateWorkload(int num_workloads, int num_var,
int min_read, int max_read,
int min_time, int max_time,
std::vector<Workload>* workloads) {
static thread_local std::default_random_engine generator(seed_);
static thread_local std::uniform_int_distribution<int> distribution;
static thread_local auto dice = std::bind(distribution, generator);
workloads->clear();
workloads->resize(num_workloads);
for (int i = 0; i < num_workloads; ++i) {
auto& wl = workloads->at(i);
wl.write = rand_r(&seed_) % num_var;
int r = rand_r(&seed_);
wl.write = dice() % num_var;
int r = dice();
int num_read = min_read + (r % (max_read - min_read));
for (int j = 0; j < num_read; ++j) {
wl.reads.push_back(rand_r(&seed_) % num_var);
wl.reads.push_back(dice() % num_var);
}
wl.time = min_time + rand_r(&seed_) % (max_time - min_time);
wl.time = min_time + dice() % (max_time - min_time);
}
}

Expand Down
9 changes: 6 additions & 3 deletions tests/cpp/include/test_ndarray_utils.h
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,7 @@
#include <cstdlib>
#include <string>
#include <map>
#incldue <random>
#include "test_util.h"
#include "test_op.h"

Expand All @@ -54,9 +55,11 @@ inline unsigned gen_rand_seed() {
}

inline float RandFloat() {
static unsigned seed = gen_rand_seed();
double v = rand_r(&seed) * 1.0 / RAND_MAX;
return static_cast<float>(v);
static thread_local std::random_device device;
static thread_local std::default_random_engine generator(device());
static thread_local std::uniform_real_distribution<float> distribution;
static thread_local auto dice = std::bind(distribution, generator);
return dice();
}

Copy link
Copy Markdown
Contributor

@perdasilva perdasilva Dec 9, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a RandInt method and use this in rnn_impl?
Would be interested to get understand what happens memory-wise with these thread_locals, and also the openmp parallelization when I see you next =)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is file is in tests/.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no facepalm emoticon, so I had thumbs-up

// Get an NDArray with provided indices, prepared for a RowSparse NDArray.
Expand Down