WMR Exemplar: LastFM million-song dataset
Initial Publication Date: August 13, 2014
Summary
This module demonstrates how hadoop and WMR can be used to analyze the lastFM million song dataset. It incorporates several advanced hadoop techniques such as job chaining and multiple input. After providing one code example, students are encouraged to try additional analyses on their own. Students should know how to use the WMR hadoop interface before beginning this module.
Learning Goals
Given an example of a map-reduce computation using multiple data files as input and chaining of jobs, students will be able to complete their own analyses of the data to answer different questions.
Context for Use
This example can be used after students have gained some familiarity with the map-reduce concept and have used WebMapReduce (WMR), such as completing the introductory WMR module from this collection.
Description and Teaching Materials
You can visit the module in your browser:
Hadoop LastFM Analysis
Or you can download the module in either PDF, latex, or Word format.
PDF Format: HadoopLastFMAnalysis.pdf.
Latex Format: latex.tar.gz.
Teaching Notes and Tips
You will need access to the data on a system with hadoop and WMR installed. Contact Libby Shoop, shoop at macalester dot edu, for more information about WMR access for educational purposes or to obtain the data example.
The additional analyses we suggest can be used as homework problems.
Share your modifications and improvements to this activity through the Community Contribution Tool »
Assessment
Assessment instrument not available.
References and Resources
Please see the WebMapRedcue home page for how to obtain and install WMR.
The dataset was obtained from Columbia University's LabROSA. However it has been converted into a format that is easier to workwith on WMR. The edited dataset is also much smaller since it doesn't include the audio analysis information. If you would like the smaller dataset for your own WMR cluster please contact JLyman@macalester.edu or shoop@macalester.edu.Comment? Start the discussion about WMR Exemplar: LastFM million-song dataset