WMR Exemplar: LastFM million-song dataset

Jeffrey Lyman (edited by Libby Shoop), Macalester College
Author Profile


This module demonstrates how hadoop and WMR can be used to analyze the lastFM million song dataset. It incorporates several advanced hadoop techniques such as job chaining and multiple input. After providing one code example, students are encouraged to try additional analyses on their own. Students should know how to use the WMR hadoop interface before beginning this module.

Learning Goals

Given an example of a map-reduce computation using multiple data files as input and chaining of jobs, students will be able to complete their own analyses of the data to answer different questions.

Context for Use

This example can be used after students have gained some familiarity with the map-reduce concept and have used WebMapReduce (WMR), such as completing the introductory WMR module from this collection.

Description and Teaching Materials

You can visit the module in your browser:
Hadoop LastFM Analysis

Or you can download the module in either PDF, latex, or Word format.
PDF Format: HadoopLastFMAnalysis.pdf.
Latex Format: latex.tar.gz.

Teaching Notes and Tips

You will need access to the data on a system with hadoop and WMR installed. Contact Libby Shoop, shoop at macalester dot edu, for more information about WMR access for educational purposes or to obtain the data example.

The additional analyses we suggest can be used as homework problems.


Assessment instrument not available.

References and Resources

Please see the WebMapRedcue home page for how to obtain and install WMR.

The dataset was obtained from Columbia University's LabROSA. However it has been converted into a format that is easier to workwith on WMR. The edited dataset is also much smaller since it doesn't include the audio analysis information. If you would like the smaller dataset for your own WMR cluster please contact JLyman@macalester.edu or shoop@macalester.edu.

Comment? Start the discussion about WMR Exemplar: LastFM million-song dataset