SC13 CSinParallel Workshop:
Using map-reduce to teach data-intensive scalable computing across the CS curriculum
Wednesday 1:30-5:00 pm, November 20, 2013
Dick Brown, St. Olaf College; Libby Shoop, Macalester College; Joel Adams, Calvin College
SC13 Map-Reduce Workshop Part 1 -- slides (Acrobat (PDF) 2.5MB Nov20 13)
Abstract
Map­-reduce, the cornerstone computational framework for cloud computing applications, has star appeal to draw students to the study of parallelism. Participants will carry out hands-­on exercises designed for students at CS1/intermediate/advanced levels that introduce data­intensive scalable computing concepts, using WebMapReduce (WMR), a simplified open­source interface to the widely used Hadoop map­reduce programming environment, and using Hadoop itself. These hands­-on exercises enable students to perform data­ intensive scalable computations carried out on the most widely deployed map­reduce framework, used by Facebook, Microsoft, Yahoo, and other companies. WMR supports programming in a choice of languages (including Java, Python, C++, C#, Scheme); participants will be able to try exercises with languages of their choice. Workshop includes brief introduction to direct Hadoop programming, and information about access to cluster resources supporting WMR. Workshop materials will reside on csinparallel.org, along with WMR software. Intended audience: CS instructors. Laptop required (Windows, Mac, or Linux).
Part 1 -- Fundamentals; Introductory Courses
This first half of the workshop introduces map-reduce computing through the WebMapReduce (WMR)simplified interface to Hadoop, then shares our experience teaching map-reduce and related concepts of parallel and distributed computing to students in introductory sequences.
-
Introductory presentation (1:30pm)
-
Goals of the workshop.
-
Map-reduce computing
-
Demonstration of WebMapReduce (WMR)
Resources: Why teach map-reduce with WMR?; WMR users guide, including languages included (so far) -
Teaching map-reduce with WMR in the introductory sequence:
materials, teaching with frameworks, strategies, and experience
Resources: Module; direct link to teaching materials ( This site may be offline. )
-
Goals of the workshop.
-
Hands-on exercises (2:10pm)
-
Getting started with WMR
-
Exercises
Resources: Intro to WMR module ( This site may be offline. ) ; see Using WMR ( This site may be offline. ) , then Counting words with WMR ( This site may be offline. ) (Python) Data sets on HDFS: /shared/gutenberg/CompleteShakespeare.txt, AnnaKarenina.txt, WarAndPeace.txt; /shared/gutenberg/all/group8
Alternative explorations: WMR code examples in various languages;
-
Getting started with WMR
-
Break (3:00pm)
Note: A SSH client will be needed for hands-on exercise in second half. - Macintosh and Linux users -- included with standard setup - Windows users -- available applications include PuTTY and WinSCP.
Part 2 -- Intermediate and Advanced Courses
This part of the workshop uses WMR to explore use of map-reduce computing in more advanced courses, and examines the relationship between the WMR interface and the Hadoop computations it performs.
-
More on WebMapReduce (3:30pm)
-
WebMapReduce and its architecture; obtaining and installing WMR
Resources: WMR sourceforge site; admin page -
Examples: using WMR and map-reduce in upper-division (undergraduate) courses.
Resources: Module, Concurrency and Map-Reduce Strategies in Various Programming Languages
-
WebMapReduce and its architecture; obtaining and installing WMR
- Using Hadoop directly (3:45pm)
-
Overview of the Hadoop implementation of map-reduce.
-
Examples of Hadoop code
-
Hands-on exercises (25 min)
Resources: Running Hadoop Java code for word count and other examples ( This site may be offline. ) ; Hadoop documentation (stable release 1.2.1)
- Use of map-reduce in undergraduate research projects -- examples
- "Big-data:" What is it? Map-reduce vs. databases, structured vs. unstructured data.
Please complete our own short survey for grant assessment purposes.