Concurrent Access to Data Structures in C++
Summary
This module enables students to experiment with creating a task-parallel solution to the problem of crawling the web by using C++ with Boost threads and thread-safe data structures available in the Intel Threading Building Blocks (TBB) library. We provide enough code for a sequential solution, which the students can use as the basis for creating a task-parallel threaded solution. We provide some code to get students started with the threaded solution, because in a typical use case this may the first threaded program those students have encountered.
Learning Goals
- Students should be able to define what a thread-safe data structure object is
- Given a definition of an object, students should be able to distinguish whether it is properly thread-safe or not and why
- Students should be able to describe how locking may be used to ensure proper concurrent access to data structures and how deadlock is avoided
- Given a problem that can be improved using parallel access to shared data, students should be able to use built-in thread-safe classes to implement parallel solutions and determine the speedup of their solution
Context for Use
This module is typically implemented in an introductory (2nd course) or intermediate-level Java-based course. It may be taught while introducing the concepts of threads and thread-safe data structures. An instructor could present the materials (handout, slideshow presentation) in class and assign exercises as a lab or homework.
Description and Teaching Materials
You can visit the module in your browser:
Concurrent Access to Data Structures in C++
or you can download the module in either PDF format or latex format.
PDF Format: Concurrent Access to Data Structures in C++.pdf.
Latex Format: Concurrent Access to Data Structures in C++.tar.gz.
Word Format: Concurrent Access to Data Structures in C++.docx.
This is the code you can use to get students started on a lab or homework assignment:
CDS C++ student code 20120301 (gzip Archive 5kB Mar1 12)
Instructors: If you want example completed code, please contact Dick Brown, rab at stolaf dot edu.
Teaching Notes and Tips
This module utilizes the Boost package, a library of enhanced capabilities for C++. Several of these capabilities have become incorporated in the new C++11 standard, including the Boost support for programming with threads (which we will use). The module also uses a TBB Container data structure in order to obtain a thread-safe implementation, and the CURL library for conveniently accessing web information.
In the code archive:
- Two versions of web-crawler code are provided, named serial (a mostly complete sequential implementation) and parallel (a mostly complete multithreaded implementation, omitting two key code segments for students to fill in). These implementations each contain multiple source files, located in subdirectories src and include, and a Makefile for building the application (using separate directories for object files, and resulting binary).
- main() is within src/spider_driver.cpp (in each implementation).
- Two additional versions, serial_all and parallel_all, contain the complete code (please request).
The concept of the lab/homework is that students should have worked out a sequential version of the Spider already. You could do this by explaining it to them, or by having them work on that first. We have used this in the past by staging assignments so that they get some parts of the code and then must complete to make a working spider. What is provided here is the already working sequential version.
You may feel like using different data structures for parts of this assignment- we realize there are other ways to hold the information kept by the crawler. However, the TBB Containers data structures we chose for the threaded version fit the problem well. (Feedback/suggestions? Please enter below.)
The lecture/class materials present the concepts of using multiple threads that share data structures in memory. Since the intended audience is introductory to intermediate students, we include a rather gentle introduction to the issue of "data race condition" errors, and explain how the designers of the data structures in the TBB Containers library avoided such errors.
Assessment
References and Resources
Comment? Start the discussion about Concurrent Access to Data Structures in C++