User's Guide

Sphere and MapReduce

Any MapReduce programs can be re-written using Sphere. If the program only has Map phase, it can be expressed by a simple Sphere process (i.e., process each input element independently by a slave). If both Map and Reduce are present, you can use the "bucket" output in Sphere to simulate the reduce phase. The Reduce phase merges all records with the same key, while Sphere can send all output records with the same bucket ID to the same file.

Explicit MapReduce is also supported in Sector. A MapReduce program named "MROPERATION" can define some or all of the following routines, which should be compiled in a single library file, similar to the Sphere routine. These four routines are equivelant to the MapReduce routine "map", "partition", "comparison", and "reduce" used in Hadoop. Note that there is no record parser (aka input reader), which is done using the record index, as described in the Sphere programs.

If the map function is not defined, each input record will be partitioned directly by the partition function. The partition function is similar to the bucket operation in Sphere. Note that this MapReduce implementation by Sector does not use explicit key/value pair. Sector passes the whole record as a "char*" type to all the processing functions.

The following API can be used to execute a MapReduce routine. The parameter "mr" is the name of the "MROPERATION". For example, if MROPERATION is "terasort", then the value of "mr" should be "terasort".

You may find the terasort and inverted index example using MapReduce in mrsort.cpp and mrword.cpp, with the functions defined in ./examples/funcs/mr_sort.cpp and ./exampes/funcs/mr_word.cpp.