Skip to content

vectorization of tuple_map, tuple_reduce #10

@andrewcorrigan

Description

@andrewcorrigan

Do you have any thoughts on achieving vectorization via tuple_map, tuple_reduce, or any other operations in this library?

Over the years, I have experimented with implementing my own for_each_n using OpenMP/TBB and various versions of Intel's vectorization directives -- as would be allowed by a call to for_each_n with parallel_vector_execution_policy. This was unsuccessful for the case of non-trivial loops. I definitely could investigate that approach further, and it probably was just a result of limitations of older versions of icpc and my own lack of knowledge about vectorization. I could also just wait to try out implementations of parallel_vector_execution_policy...

However, your library got me thinking that perhaps vectorization could be achieved in a more explicit fashion: instead of vectorizing entire loops, perhaps instead vectorization could be achieved inside of each iteration of these loops, by vectorizing each individual tuple operation (implementable by calls to tuple_map, tuple_reduce, etc.). Typical tuple sizes are currently O(10).

The general pattern (in pseudocode) I deal with is:

for(auto cell : grid)    // <--- parallelize here, no luck at requesting vectorization here (pragma ivdep, simd, etc.)
{
   // read tuple data in from memory  -- tuple_size is O(10)

   // tuple_map  <--- vectorize here
   // tuple_reduce <-- vectorize here
   // tuple_map <-- vectorize here
   // etc.

   // store tuple data to memory
}

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions