SC13 CSinParallel Workshop:
Using map-reduce to teach data-intensive scalable computing across the CS curriculum
Wednesday 1:30-5:00 pm, November 20, 2013
Dick Brown, St. Olaf College; Libby Shoop, Macalester College; Joel Adams, Calvin College
SC13 Map-Reduce Workshop Part 1 -- slides (Acrobat (PDF) 2.5MB Nov20 13)
MapÂ­-reduce, the cornerstone computational framework for cloud computing applications, has star appeal to draw students to the study of parallelism. Participants will carry out hands-Â­on exercises designed for students at CS1/intermediate/advanced levels that introduce dataÂ­intensive scalable computing concepts, using WebMapReduce (WMR), a simplified openÂ­source interface to the widely used Hadoop mapÂ­reduce programming environment, and using Hadoop itself. These handsÂ­-on exercises enable students to perform dataÂ­ intensive scalable computations carried out on the most widely deployed mapÂ­reduce framework, used by Facebook, Microsoft, Yahoo, and other companies. WMR supports programming in a choice of languages (including Java, Python, C++, C#, Scheme); participants will be able to try exercises with languages of their choice. Workshop includes brief introduction to direct Hadoop programming, and information about access to cluster resources supporting WMR. Workshop materials will reside on csinparallel.org, along with WMR software. Intended audience: CS instructors. Laptop required (Windows, Mac, or Linux).
Part 1 -- Fundamentals; Introductory Courses
This first half of the workshop introduces map-reduce computing through the WebMapReduce (WMR)simplified interface to Hadoop, then shares our experience teaching map-reduce and related concepts of parallel and distributed computing to students in introductory sequences.
Introductory presentation (1:30pm)
Goals of the workshop.
Demonstration of WebMapReduce (WMR)
Resources: Why teach map-reduce with WMR?; WMR users guide, including languages included (so far)
Teaching map-reduce with WMR in the introductory sequence:
materials, teaching with frameworks, strategies, and experience
Resources: Module; direct link to teaching materials
- Goals of the workshop.
Hands-on exercises (2:10pm)
Getting started with WMR
Resources: Intro to WMR module ; see Using WMR , then Counting words with WMR (Python) Data sets on HDFS: /shared/gutenberg/CompleteShakespeare.txt, AnnaKarenina.txt, WarAndPeace.txt; /shared/gutenberg/all/group8
Alternative explorations: WMR code examples in various languages;
- Getting started with WMR
Note: A SSH client will be needed for hands-on exercise in second half. - Macintosh and Linux users -- included with standard setup - Windows users -- available applications include PuTTY and WinSCP.
Part 2 -- Intermediate and Advanced Courses
This part of the workshop uses WMR to explore use of map-reduce computing in more advanced courses, and examines the relationship between the WMR interface and the Hadoop computations it performs.
More on WebMapReduce (3:30pm)
WebMapReduce and its architecture; obtaining and installing WMR
Resources: WMR sourceforge site; admin page
Examples: using WMR and map-reduce in upper-division (undergraduate) courses.
Resources: Module, Concurrency and Map-Reduce Strategies in Various Programming Languages
- WebMapReduce and its architecture; obtaining and installing WMR
- Using Hadoop directly (3:45pm)
Overview of the Hadoop implementation of map-reduce.
Examples of Hadoop code
Hands-on exercises (25 min)
Resources: Running Hadoop Java code for word count and other examples ; Hadoop documentation (stable release 1.2.1)
- Use of map-reduce in undergraduate research projects -- examples
- "Big-data:" What is it? Map-reduce vs. databases, structured vs. unstructured data.
Please complete our own short survey for grant assessment purposes.