Uploaded image for project: 'SystemDS'
  1. SystemDS
  2. SYSTEMDS-220

New Second-Order Builtin Function 'apply'

    XMLWordPrintableJSON

Details

    • Task
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • Compiler, Parser, Runtime

    Description

      In several scripts, there is a need to apply rather complex functions to each cell of a matrix. The most natural way of expressing that, especially if this function involves loops and branches, are DML-bodied functions over scalars and surrounding loops over all cells in the matrix. Below is an artificial example:

      foo = function( Double in ) return ( Double out ){
      x = in^2;
      if( x > in*2 )
      x = x - in/3;
      out = x + 7;
      }
      
      R = matrix(0, rows=nrow(A), cols=ncol(A));
      for( i in 1:nrow(A) )
      for( j in 1:ncol(A) )
      R[i,j] = foo(A[i,j]) 
      

      Especially, on large data, this would however cause severe performance problems. Accordingly, people usually "vectorize" these operations by hand which is unfortunately not too easy for very complex functions.

      R = A^2 - ppred(A^2, A*2, ">")*A/3 + 7;
      

      For this reason, we would like to integrate a second-order builtin function apply that would allow users to use their custom functions with reasonable performance.

      R = apply(A, foo);
      

      We would initially constraint this builtin function to DML-bodied functions with (1) single scalar in / single scalar out, (2) no support for nested function invocations, (3) no creation of arbitrarily large intermediates (we assume small memory footprint per cell). These constraints would allow us to provide a very efficient new unary apply operation (multi-threaded in CP, narrow transformation in distributed backends).

      Attachments

        Activity

          People

            Unassigned Unassigned
            mboehm7 Matthias Boehm
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

              Created:
              Updated:

              Time Tracking

                Estimated:
                Original Estimate - 160h
                160h
                Remaining:
                Remaining Estimate - 160h
                160h
                Logged:
                Time Spent - Not Specified
                Not Specified