Instructor Example: Optimizing CUDA for GPU Architecture
Initial Publication Date: August 13, 2014
Summary
nVIDIA GPU cards use an advanced architecture to efficiently execute massively parallel programs written in CUDA C. This module explains how to take advantage of this architecture to provide maximum speedup for your CUDA applications using a Mandelbrot set generator as an example. The code is provided and parts of it are explained. It is intended to be a resource for instructors wishing to create lectures, though it can also be presented to students as is.
Learning Goals
The goal of this module is to provide instructors with
- a real code example that makes use of thousands of cores on a GPU;
- some information about the CUDA architecture;
- plots of data that illustrate how the choice of number of blocks and how many threads per block affect the performance of the program.
Context for Use
This could be used in a computer systems course after introducing the CUDA GPU architecture and getting students familiar with CUDA. However, it could also be used as a very brief introduction to GPU architecture and when it is useful to use CUDA programming.
Description and Teaching Materials
You can visit the module in your browser:
Optimizing CUDA for GPU Architecture
Or you can download the module in either PDF, latex, or Word format.
PDF Format: OptimizingCUDA_for_GPU_Architecture.pdf.
Latex Format: latex.tar.gz.
Word Format: OptimizingCUDA_for_GPU_Architecture.docx.
Teaching Notes and Tips
To run the code, you will need a machine with a CUDA-capable GPU card and you will need to install the CUDA toolkit from nVIDIA. If you want to display the Mandelbrot set computation results, you will need a machine that can display X11 graphics.
Share your modifications and improvements to this activity through the Community Contribution Tool »
Assessment
Assessment instrument not available.
References and Resources
You will need the CUDA Toolkit installed in order to build the CUDA code provided.
Comment? Start the discussion about Instructor Example: Optimizing CUDA for GPU Architecture