Project-2

A Study in Parallel Algorithms : Stream Compaction

As show in the "Scan Performance" graph, the serial version of this algorithm has a better performance for small arrays. However, as the size of the array increases, the CUDA version becomes faster. I think there must be something wrong with my shared memory version of scan, because I was expecting it to be faster than the global memory version in all cases. However, what I found is that for smaller arrays, just like the serial version, the global memory is faster than the shared memory algorithm.

I also compared a Stream Compaction CUDA algorithm with a thrust. The thrust version is faster than my CUDA implementation in all cases. As I mentioned before, I think I have not been able to create a well-optimized version of these algorithms.

REFERENCES

"Parallel Prefix Sum (Scan) with CUDA." GPU Gems 3.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
Project-2		Project-2
.gitignore		.gitignore
README.md		README.md
scanPerformance.png		scanPerformance.png
streamCompaction.png		streamCompaction.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Project-2

REFERENCES

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Project-2

REFERENCES

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages