rhive-apply {RHive}R Documentation

R Distributed apply function using HQL

Description

R Distributed apply function using HQL

Usage

rhive.napply(tableName, FUN, ...,forcedRef=TRUE)
rhive.sapply(tableName, FUN, ..., forcedRef=TRUE)
rhive.mrapply(tableName, mapperFUN, reducerFUN, mapInput=NULL,
  mapOutput=NULL, by=NULL, reduceInput=NULL,reduceOutput=NULL,
  mapperArgs=NULL, reducerArgs=NULL, bufferSize=-1L, verbose=FALSE, 
  forcedRef=TRUE)
rhive.mapapply(tableName, mapperFUN, mapInput=NULL, mapOutput=NULL,
  by=NULL, args=NULL, bufferSize=-1L, verbose=FALSE, forcedRef=TRUE)
rhive.reduceapply(tableName, reducerFUN, reduceInput=NULL, 
  reduceOutput=NULL, args=NULL, bufferSize=-1L, verbose=FALSE,
  forcedRef=TRUE)

Arguments

tableName

hive table name.

FUN

the function to be applied.

...

optional arguments to 'FUN'.

mapperFUN

a function which is executed on each worker node. The so-called mapper typically maps input key/value pairs to a set of intermediate key/value pairs.

reducerFUN

a function which is executed on each worker node. The so-called reducer reduces a set of intermediate values which share a key to a smaller set of values. If no reducer is used leave NULL.

mapInput

map-input column list.

mapOutput

map-output column list.

by

cluster key column

reduceInput

reduce-input column list.

reduceOutput

reduce-output column list.

bufferSize

streaming buffer size.

verbose

print generated HQL.

args

custom environment.

mapperArgs

mapper custom environment.

reducerArgs

reducer custom environment.

forcedRef

the option which forces to create temp-table for result.

Author(s)

rhive@nexr.com

Examples

## try to connect hive server
## Not run: rhive.connect("hive-server-ip")

## invoke napply for numeric return type
## Not run: rhive.napply('emp', function(item) {
item * 10
},'sal')
## End(Not run)

## invoke sapply for string return type
## Not run: rhive.napply('emp', function(item) {
paste('NAME : ', item, sep='')
}, 'ename')
## End(Not run)

## custom map/reduce script
## Not run: map <- function(k, v) {
    if(is.null(v)) {
        put(NA, 1)
    }
    lapply(v, function(vv) {
        lapply(strsplit(x = vv, split = "\t")[[1]], 
            function(w) put(paste(args, w, sep = ""), 1))
    })
}
 
reduce <- function(k, vv) {
    put(k, sum(as.numeric(vv)))
}
 
rhive.mrapply("emp", map, reduce, c("ename", "position"), c("position", "one"),
    by="position", c("position", "one"), c("position", "count"))
## End(Not run)


## close connection
## Not run: rhive.close()

[Package RHive version 2.0-0.10 Index]