Details
-
Task
-
Status: Open
-
Major
-
Resolution: Unresolved
-
None
-
None
Description
In several scripts, there is a need to apply rather complex functions to each cell of a matrix. The most natural way of expressing that, especially if this function involves loops and branches, are DML-bodied functions over scalars and surrounding loops over all cells in the matrix. Below is an artificial example:
foo = function( Double in ) return ( Double out ){ x = in^2; if( x > in*2 ) x = x - in/3; out = x + 7; } R = matrix(0, rows=nrow(A), cols=ncol(A)); for( i in 1:nrow(A) ) for( j in 1:ncol(A) ) R[i,j] = foo(A[i,j])
Especially, on large data, this would however cause severe performance problems. Accordingly, people usually "vectorize" these operations by hand which is unfortunately not too easy for very complex functions.
R = A^2 - ppred(A^2, A*2, ">")*A/3 + 7;
For this reason, we would like to integrate a second-order builtin function apply that would allow users to use their custom functions with reasonable performance.
R = apply(A, foo);
We would initially constraint this builtin function to DML-bodied functions with (1) single scalar in / single scalar out, (2) no support for nested function invocations, (3) no creation of arbitrarily large intermediates (we assume small memory footprint per cell). These constraints would allow us to provide a very efficient new unary apply operation (multi-threaded in CP, narrow transformation in distributed backends).