Uploaded image for project: 'SystemDS'
  1. SystemDS
  2. SYSTEMDS-413

Runtime refactoring core matrix block library

    XMLWordPrintableJSON

Details

    • Task
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • Runtime
    • None

    Description

      Pull the local (non-distributed) linear algebra components of SystemML into a separate package. Define a proper object-oriented Java API for creating and manipulating local matrices. Document this API. Refactor all tests of local linear algebra functionality so that those tests use the new API. Refactor the distributed linear algebra operators (both Spark and Hadoop map-reduce) to use the new APIs for local linear algebra.

      Overall Refactoring Plan
      The MatrixBlock class will be the core locus of refactoring. The file is over 6000 lines long, has dependencies on the HOPS and LOPS layers, and contains a lot of sparse matrix code that really ought to be in SparseBlock. Even if it’s modified in place, MatrixBlock will bear little resemblance to its current form after the refactoring is completed. I recommend setting aside the current MatrixBlock class and creating new classes with equivalent functionality by copying appropriate blocks of code from the old class.

      Major changes to make relative to MatrixBlock:

      • We should create a new DenseMatrixBlock class that only covers dense linear algebra.
      • Sparse-specific code should be moved into the SparseBlock class.
      • Common functionality across dense and sparse should go into the MatrixValue superclass.
      • There should be a new class with a name like “Matrix” (we’ll need one anyway to serve as the public API) that contains a pointer to a MatrixValue and can switch between different representations. Ideally this class should be designed so that, in the future, it can serve as a matrix ADT that will wrap both local and distributed linear algebra.
      • Several fields (maxrow, maxcolumn, numGroups, and various estimates of future numbers of nonzeros) are used for stashing data that is only for internal SystemML use. Either put these into a different data structure or provide a generic mechanism for tagging a matrix block with additional application-specific data.
      • Clean up and simplify the multiple different initialization methods (different variants of the constructors and the methods init() and reset()). There should be one canonical method for each major type of initialization. Other methods that are shortcuts (i.e. reset() with no arguments) should call the canonical method internally.
      • Consider refactoring the variants of ternaryOperations() that support ctable() into something simpler that is called ctable() – perhaps a Java API that can take null values for the optional arguments.

      Other changes outside MatrixBlock:

      • The matrix classes currently depend on Hadoop I/O classes like Writable and DataInputBuffer. A local linear algebra library really shoudn’t require Hadoop. I/O methods that use Hadoop APIs should be factored out into a separate package. In paticular, MatrixValue needs to be separated from Hadoop’s WritableComparable API.
      • The contents of the following packages need to move to the new library: sysml.runtime.functionobjects and sysml.runtime.matrix.operators
      • The library will need local input and output functions. I haven’t found suitable functions yet, but they may be hidden somewhere; in that case the existing functions should be adjacent to the other local linear algebra code.
      • Utility functions under classes in sysml.runtime.util will need to be replicated.
      • The more obscure subclasses of MatrixValue (MatrixCell, WeightedCell, etc.) do NOT need to be moved over.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              mboehm7 Matthias Boehm
              Votes:
              1 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated: