WebMapReduce workshop session page
WebMapReduce (WMR) is a strategically simplified web interface for launching map-reduce computations using the prominent open-source Hadoop framework or a comparable map-reduce framework. Today's applications of cloud computing, including many widely used web-based services (notably including Facebook and many other Fortune 50 companies), frequently rely on Hadoop or a comparable system. WMR makes such computations accessible to undergraduates as early as the introductory course in CS. WMR supports map-reduce computations in numerous programming languages, including the most common introductory languages.
This workshop session page provides a quick hands-on approach to introducing WMR, using WMR in introductory courses, and employing WMR and map-reduce computation in more advanced settings.
Contents
Getting Started
- Map-reduce computing
Resources: Why teach map-reduce with WMR?; WMR users guide, including languages included (so far)
- Concept of action; key-value pairs; framework computation; data parallelism
- Splitting input; sorting feature; fault tolerance; scalability; streaming and iterators
- Demonstration of WMR
- http://csinparallel.stolaf.edu/wmr (broken link) - Logging in; the user interface
- Example mapper and reducer code for word frequency counting, Python3 language
- Choice of data set
- The Test interface; Submitting to Hadoop
- Performance and scale
- Getting started with WMR
- Creating a WMR account
- Computing word frequencies
- Variations: ignoring case and punctuation
ordering by frequency; KWIC/concordance - Quick overview of using WMR/map-reduce in courses: Intro module ; advanced topics
Introductory WMR in courses
- Goals of the session
- Teaching map-reduce computing with WMR in the introductory sequence:
materials, teaching with frameworks, strategies, and experience
Resources: Module; direct link to teaching materials
- Exercises
Resources: Intro to WMR module ; see Using WMR , then Counting words with WMR (Python)
Data sets on HDFS: /shared/gutenberg/CompleteShakespeare.txt, AnnaKarenina.txt, WarAndPeace.txt; /shared/gutenberg/all/group8
Alternative explorations: WMR code examples in various languages;
- Patterns and Exemplars
More advanced topics
- Teaching map-reduce programming techniques using WMR
- WebMapReduce and its architecture; obtaining and installing WMR
Resources: WMR sourceforge site; admin page
- Examples: using WMR and map-reduce in upper-division (undergraduate) courses.
Resources: Module, Concurrency and Map-Reduce Strategies in Various Programming Languages
- Using Hadoop directly
- Overview of the Hadoop implementation of map-reduce
- Examples of Hadoop code
- Hands-on exercises
Resources: Running Hadoop Java code for word count and other examples ( This site may be offline. ) ;
Hadoop documentation (stable release 1.2.1)
- Applications
- Use of map-reduce in undergraduate research projects -- example
- "Big-data:" What is it? Map-reduce vs. databases, structured vs. unstructured data
Appendix
Example mapper and reducer code for computing word frequencies in Python3
Mapper:
def mapper(key,value):
words = key.split()
for word in words:
Wmr.emit(word, '1')
Reducer:
def reducer(key, iter):
sum = 0
for s in iter:
sum = sum + int(s)
Wmr.emit(key, str(sum))
More WMR examples
- Examples in four languages, with example data (tested in Aug 2015)
- wc = word frequency count
id = identity
index = index of words within values
conc = concordance/KWIC - Further examples (untested in Aug 2015)
- wmr-intro:
wc = word frequency count
co = concordance/KWIC
cw = word frequency count ordered by frequency instead of by word (2 passes)
id = identity
in = index of words within values
mc = frequency of ratings among all movies, for netflix data
line format: movie-id,reviewer-id,rating(1-5),date
example: 201,14,3,2005-09-06
120MB test data at cluster path /shared/netflix/test/testM.txt
ma = average rating per movie sorted into bins, for netflix data (2 passes) - wmr-adv:
mc = combiner implemented within mapper