scatter {rmr2}R Documentation

Functions to split a file over several parts or to merge multiple parts into one

Description

scatter takes in input a file and pushes it through a mapreduce jobs that writes it over a number of parts (system dependent, specifically on the number of reducers). This helps with parallelization of the next map phase. Gather does the opposite.

Usage

scatter(input, output = NULL, ...)
gather(input, output = NULL, ...)

Arguments

input

The input file

output

Output, defaults to the same as mapreduce output

...

Other options passed directly to mapreduce

Value

Same as for mapreduce.

Known Limitations

Scatter discards keys. This is a limitation that should be addressed in the future


[Package rmr2 version 3.3.1 Index]