Initial Publication Date: February 9, 2016

SIGCSE 2016 Workshop 307:

CSinParallel: Using WebMapReduce to teach parallel computing concepts, hands-on

7 PM - 10 PM, Friday March 4, 2016
Room L11-L12

Instructors: Dick Brown (St. Olaf College), and Libby Shoop (Macalester College), Joel Adams (Calvin College)

Abstract: Map-reduce computation is the on-ramp to data-intensive cloud computing, and arguably the most widely deployed form of parallel/distributed computing. Participants will carry out exercises designed for students at CS1, intermediate, and advanced levels that introduce data-intensive scalable computing concepts using WebMapReduce (WMR), a simplified open-source interface to the dominant Hadoop map-reduce programming environment. WMR supports programming in a choice of languages including Python, Java, C++, and C#. Besides a hands-on experience with introductory teaching materials, the workshop includes an overview of teaching advanced map-reduce programming using WMR, and a comparison of WMR to direct Hadoop programming. All materials will reside on csinparallel.org, and the demonstration WMR system is reservable for participants' courses. Intended audience: CS instructors. Web-enabled laptop required.

Schedule

  • This first half of the workshop introduces map-reduce computing through the WebMapReduce (WMR)simplified interface to Hadoop, then shares our experience teaching map-reduce and related concepts of parallel and distributed computing to students in introductory sequences.
  • The second part of the workshop uses WMR to explore use of map-reduce computing in more advanced courses, and examines the relationship between the WMR interface and the Hadoop computations it performs.

7-7:30 PM: Introductory courses

Slides: SIGCSE 2016 Workshop 307 slides, Part 1 (PowerPoint 2007 (.pptx) 407kB Mar4 16)

Welcome, goals

Introduction to map-reduce computing; demo of WebMapReduce (WMR)

Resources: Why teach map-reduce with WMR?; WMR users guide, including some supported languages

Teaching map-reduce with WMR in the introductory sequence: materials, strategies, and experience

Kinesthetic map-reduce class activity

A minimal set of teaching materials for 1-3 days ( This site may be offline. )

An extended set of exercises

Resources: Module

7:30-8:30 PM: Hands-on exercises

Getting started with WMR

Introductory exercises

Note: We suggest opening these exercises in a new tab or window in your browser, with WMR itself in a separate browser window, and this workshop page in another window/tab.

Resources that students can use in courses and that you can try tonight:

Starting Point: Intro to WMR module ( This site may be offline. ) ; see Using WMR ( This site may be offline. ) , then Counting words with WMR ( This site may be offline. ) (Python)

Data sets on HDFS: /shared/gutenberg/CompleteShakespeare.txt, AnnaKarenina.txt, WarAndPeace.txt

  • Note: Please avoid large Gutenberg "groups" for this workshop

Alternative explorations: WMR code examples in various languages; extended exercise set

Break (take a break for refreshments at any time)

8:30-9:00 PM: Intermediate and advanced courses

Slides: SIGCSE 2016 Workshop 307 slides, Part 2 (PowerPoint 2007 (.pptx) 214kB Mar4 16)

WebMapReduce and its architecture; obtaining and installing WMR

Resources: WMR sourceforge site; admin page

Examples: using WMR and map-reduce in upper-division (undergraduate) courses and projects. Appropriate uses of map-reduce.

CS1-level examples, but faster. Present map-reduce programming techniques

Overview of the Hadoop implementation of map-reduce.

Resources: Hadoop code for word count; Hadoop documentation

9:00-9:40 PM: Hands-on exercises

Assortment of WMR exercises

Resources that students can work through and you can try tonight:

Movie Data

Intro to WMR module ( This site may be offline. ) ; see WMR Activities ( This site may be offline. )

Data set on HDFS: /shared/MovieLens2

Traffic Data Analysis

UK Detailed Longitudinal Traffic Accident Data

Network Analysis using Flixster Data

Network Analysis using a social network dataset

The Million Song Dataset

Analyzing the lastFM song data

9:40-10:00 PM: Discussion and feedback

Assessment: Please fill out the SIGCSE online form for this workshop.

Also, please complete our own short survey for grant assessment purposes. We appreciate your help in evaluating our NSF-funded project.


« Previous Page