WebMapReduce workshop session page

WebMapReduce (WMR) is a strategically simplified web interface for launching map-reduce computations using the prominent open-source Hadoop framework or a comparable map-reduce framework. Today's applications of cloud computing, including many widely used web-based services (notably including Facebook and many other Fortune 50 companies), frequently rely on Hadoop or a comparable system. WMR makes such computations accessible to undergraduates as early as the introductory course in CS. WMR supports map-reduce computations in numerous programming languages, including the most common introductory languages.

This workshop session page provides a quick hands-on approach to introducing WMR, using WMR in introductory courses, and employing WMR and map-reduce computation in more advanced settings.

Contents

Getting Started

  1. Map-reduce computing
    Resources: Why teach map-reduce with WMR?; WMR users guide, including languages included (so far)
    • Concept of action; key-value pairs; framework computation; data parallelism
    • Splitting input; sorting feature; fault tolerance; scalability; streaming and iterators
  2. Demonstration of WMR
  3. Getting started with WMR
    • Creating a WMR account
    • Computing word frequencies
    • Variations: ignoring case and punctuation
      ordering by frequency; KWIC/concordance
  4. Quick overview of using WMR/map-reduce in courses: Intro module ; advanced topics

Introductory WMR in courses

  1. Goals of the session
  2. Teaching map-reduce computing with WMR in the introductory sequence:
    materials, teaching with frameworks, strategies, and experience
    Resources: Module; direct link to teaching materials
  3. Exercises
    Resources: Intro to WMR module ; see Using WMR , then Counting words with WMR (Python)
    Data sets on HDFS: /shared/gutenberg/CompleteShakespeare.txt, AnnaKarenina.txt, WarAndPeace.txt; /shared/gutenberg/all/group8
    Alternative explorations: WMR code examples in various languages;
  4. Patterns and Exemplars

More advanced topics

  1. Teaching map-reduce programming techniques using WMR
  2. Using Hadoop directly
  3. Applications
    • Use of map-reduce in undergraduate research projects -- example
    • "Big-data:" What is it? Map-reduce vs. databases, structured vs. unstructured data

Appendix

Example mapper and reducer code for computing word frequencies in Python3

Mapper:

def mapper(key,value): 
words = key.split()
for word in words:
Wmr.emit(word, '1')

Reducer:

 def reducer(key, iter):
sum = 0
for s in iter:
sum = sum + int(s)
Wmr.emit(key, str(sum))

More WMR examples

  • Examples in four languages, with example data (tested in Aug 2015)
    • wc = word frequency count
      id = identity
      index = index of words within values
      conc = concordance/KWIC
  • Further examples (untested in Aug 2015)
    • wmr-intro:
      wc = word frequency count
      co = concordance/KWIC
      cw = word frequency count ordered by frequency instead of by word (2 passes)
      id = identity
      in = index of words within values
      mc = frequency of ratings among all movies, for netflix data
          line format:   movie-id,reviewer-id,rating(1-5),date
      example: 201,14,3,2005-09-06
      120MB test data at cluster path /shared/netflix/test/testM.txt

      ma = average rating per movie sorted into bins, for netflix data (2 passes)
    • wmr-adv:
      mc = combiner implemented within mapper