Initial Publication Date: August 4, 2015

WebMapReduce workshop session page

WebMapReduce (WMR) is a strategically simplified web interface for launching map-reduce computations using the prominent open-source Hadoop framework or a comparable map-reduce framework. Today's applications of cloud computing, including many widely used web-based services (notably including Facebook and many other Fortune 50 companies), frequently rely on Hadoop or a comparable system. WMR makes such computations accessible to undergraduates as early as the introductory course in CS. WMR supports map-reduce computations in numerous programming languages, including the most common introductory languages.

This workshop session page provides a quick hands-on approach to introducing WMR, using WMR in introductory courses, and employing WMR and map-reduce computation in more advanced settings.

Contents

Getting started
Introductory WMR in courses
More advanced topics

Appendix

Getting Started

Map-reduce computing
Resources: Why teach map-reduce with WMR?; WMR users guide, including languages included (so far)

Concept of action; key-value pairs; framework computation; data parallelism
Splitting input; sorting feature; fault tolerance; scalability; streaming and iterators

Demonstration of WMR

http://csinparallel.stolaf.edu/wmr (broken link) - Logging in; the user interface
Example mapper and reducer code for word frequency counting, Python3 language
Choice of data set
The Test interface; Submitting to Hadoop
Performance and scale

Getting started with WMR

Creating a WMR account
Computing word frequencies
Variations: ignoring case and punctuation
ordering by frequency; KWIC/concordance

Quick overview of using WMR/map-reduce in courses: Intro module ; advanced topics

Introductory WMR in courses

Goals of the session
Teaching map-reduce computing with WMR in the introductory sequence:
materials, teaching with frameworks, strategies, and experience
Resources: Module; direct link to teaching materials
Exercises
Resources: Intro to WMR module ; see Using WMR , then Counting words with WMR (Python)
Data sets on HDFS: /shared/gutenberg/CompleteShakespeare.txt, AnnaKarenina.txt, WarAndPeace.txt; /shared/gutenberg/all/group8
Alternative explorations: WMR code examples in various languages;
Patterns and Exemplars

Appendix

Example mapper and reducer code for computing word frequencies in Python3

Mapper:

def mapper(key,value): 
    words = key.split() 
    for word in words:
        Wmr.emit(word, '1')

Reducer:

 def reducer(key, iter):
    sum = 0
    for s in iter:
        sum = sum + int(s)
    Wmr.emit(key, str(sum))

More WMR examples

Examples in four languages, with example data (tested in Aug 2015)

wc = word frequency count
id = identity
index = index of words within values
conc = concordance/KWIC

Further examples (untested in Aug 2015)

wmr-intro:
wc = word frequency count
co = concordance/KWIC
cw = word frequency count ordered by frequency instead of by word (2 passes)
id = identity
in = index of words within values
mc = frequency of ratings among all movies, for netflix data
```
    line format:   movie-id,reviewer-id,rating(1-5),date
    example:       201,14,3,2005-09-06
    120MB test data at cluster path /shared/netflix/test/testM.txt
```
ma = average rating per movie sorted into bins, for netflix data (2 passes)
wmr-adv:
mc = combiner implemented within mapper