Tuple MapReduce for Hadoop: MapReduce made easy

Download IEEE ICDM 2012 Paper: Tuple MapReduce: beyond classic MapReduce

Some Features:

  • Easier MapReduce development.
  • Support for Tuples instead of just Key/Value pairs.
  • Secondary sorting as easy as it can get.
  • Built-in reduce-side joining capabilities.
  • Performance and flexibility.
  • Configuration by object instance instead of classes.
  • First-class multiple inputs & outputs.
  • Built-in serialization support for Thrift and ProtoStuff.
  • 100% Hadoop compatibility. 0.20.X, 1.X, 2.X and YARN.

Overview »

Easier Hadoop. Same performance.

Hadoop has a steep learning curve. Pangool aims to simplify Hadoop development without loosing the performance and flexibility that the low-level Hadoop API provides.

The most common patterns that arise when writing MapReduce jobs are easier to implement with Pangool with similar performance.

See more details! »

Secondary sorting.

Although it is commonly needed in parallel data processing, secondary sorting is a nightmare to accomplish with the standard Java MapReduce Hadoop API.

Check how easy secondary sorting is with Pangool:

job.setGroupByFields("word");
job.setOrderBy(new OrderBy()
  .add("word", Order.ASC)
  .add("count", Order.DESC));

More in the Introduction... »

Joins.

Although it is a common pattern when working with big data, joining heterogeneous data sources is extremely complex to implement with the the standard Java MapReduce Hadoop API.

With Pangool it is as easy as it can get:

job.addIntermediateSchema(urlMapSchema);
job.addIntermediateSchema(urlRegisterSchema);
job.setGroupByFields("url");

Deep details are in the User Guide! »