vectorization of tuple_map, tuple_reduce

Do you have any thoughts on achieving vectorization via `tuple_map`, `tuple_reduce`, or any other operations in this library?

Over the years, I have experimented with implementing my own `for_each_n` using OpenMP/TBB and various versions of Intel's vectorization directives -- as would be allowed by a call to `for_each_n` with `parallel_vector_execution_policy`.  This was unsuccessful for the case of non-trivial loops.  I definitely could investigate that approach further, and it probably was just a result of limitations of older versions of `icpc` and my own lack of knowledge about vectorization.  I could also just wait to try out implementations of `parallel_vector_execution_policy`...

However, your library got me thinking that perhaps vectorization could be achieved in a more explicit fashion: instead of vectorizing entire loops, perhaps instead vectorization could be achieved _inside_ of each iteration of these loops, by vectorizing each individual tuple operation (implementable by calls to `tuple_map`, `tuple_reduce`, etc.).  Typical tuple sizes are currently O(10).

The general pattern (in pseudocode) I deal with is:

```
for(auto cell : grid)    // <--- parallelize here, no luck at requesting vectorization here (pragma ivdep, simd, etc.)
{
   // read tuple data in from memory  -- tuple_size is O(10)

   // tuple_map  <--- vectorize here
   // tuple_reduce <-- vectorize here
   // tuple_map <-- vectorize here
   // etc.

   // store tuple data to memory
}
```


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

vectorization of tuple_map, tuple_reduce #10

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

vectorization of tuple_map, tuple_reduce #10

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions