Instructor Example: Optimizing CUDA for GPU Architecture

Jeffrey Lyman and Libby Shoop, Macalester College
Author Profile
Initial Publication Date: August 13, 2014


nVIDIA GPU cards use an advanced architecture to efficiently execute massively parallel programs written in CUDA C. This module explains how to take advantage of this architecture to provide maximum speedup for your CUDA applications using a Mandelbrot set generator as an example. The code is provided and parts of it are explained. It is intended to be a resource for instructors wishing to create lectures, though it can also be presented to students as is.

Used this activity? Share your experiences and modifications

Learning Goals

The goal of this module is to provide instructors with

  • a real code example that makes use of thousands of cores on a GPU;
  • some information about the CUDA architecture;
  • plots of data that illustrate how the choice of number of blocks and how many threads per block affect the performance of the program.
Students could read through this example, or instructors could use it to prepare for a presentation in class, whose goal would be to demonstrate to students how to choose the number of blacks and threads per block to take advantage of the GPU architecture.

Context for Use

This could be used in a computer systems course after introducing the CUDA GPU architecture and getting students familiar with CUDA. However, it could also be used as a very brief introduction to GPU architecture and when it is useful to use CUDA programming.

Description and Teaching Materials

You can visit the module in your browser:
Optimizing CUDA for GPU Architecture

Or you can download the module in either PDF, latex, or Word format.
PDF Format: OptimizingCUDA_for_GPU_Architecture.pdf.
Latex Format: latex.tar.gz.
Word Format: OptimizingCUDA_for_GPU_Architecture.docx.

Teaching Notes and Tips

To run the code, you will need a machine with a CUDA-capable GPU card and you will need to install the CUDA toolkit from nVIDIA. If you want to display the Mandelbrot set computation results, you will need a machine that can display X11 graphics.


Assessment instrument not available.

References and Resources

You will need the CUDA Toolkit installed in order to build the CUDA code provided.

Comment? Start the discussion about Instructor Example: Optimizing CUDA for GPU Architecture