Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • SystemDS 3.1
    • SystemDS 3.1
    • federated
    • None

    Description

      Allow multiple coordinators to share federated workers with each other.

      Attachments

        Activity

          Commit bd94ebe10f5280e454e86bc726a3320b120845f5 in systemds's branch refs/heads/main from baunsgaard
          [ https://gitbox.apache.org/repos/asf?p=systemds.git;h=bd94ebe10f ]

          SYSTEMDS-3185 Federated Lineage reuse test fix

          This commit fixes the failing tests on master, by slacking the
          requirements of the multi tenant tests, to not count operations.
          This was failing for two reasons, 1. the tests were previously
          not run on actions, 2. the counts of operations have changed to
          not reset to support the federated monitoring tool better.

          jira-bot ASF subversion and git services added a comment - Commit bd94ebe10f5280e454e86bc726a3320b120845f5 in systemds's branch refs/heads/main from baunsgaard [ https://gitbox.apache.org/repos/asf?p=systemds.git;h=bd94ebe10f ] SYSTEMDS-3185 Federated Lineage reuse test fix This commit fixes the failing tests on master, by slacking the requirements of the multi tenant tests, to not count operations. This was failing for two reasons, 1. the tests were previously not run on actions, 2. the counts of operations have changed to not reset to support the federated monitoring tool better.

          Commit 942a3a2a349cee2fcf3591e7850051538cc41fef in systemds's branch refs/heads/main from ywcb00
          [ https://gitbox.apache.org/repos/asf?p=systemds.git;h=942a3a2a34 ]

          SYSTEMDS-3185 Docs and cleanup multi-tenant federated learning

          Closes #1627.

          jira-bot ASF subversion and git services added a comment - Commit 942a3a2a349cee2fcf3591e7850051538cc41fef in systemds's branch refs/heads/main from ywcb00 [ https://gitbox.apache.org/repos/asf?p=systemds.git;h=942a3a2a34 ] SYSTEMDS-3185 Docs and cleanup multi-tenant federated learning Closes #1627.

          Commit 8e832ac085b14aa63ecd8a5baee463ac9dfa53bc in systemds's branch refs/heads/main from ywcb00
          [ https://gitbox.apache.org/repos/asf?p=systemds.git;h=8e832ac085 ]

          SYSTEMDS-3185 Caching of serialized federated responses

          Closes #1611.

          jira-bot ASF subversion and git services added a comment - Commit 8e832ac085b14aa63ecd8a5baee463ac9dfa53bc in systemds's branch refs/heads/main from ywcb00 [ https://gitbox.apache.org/repos/asf?p=systemds.git;h=8e832ac085 ] SYSTEMDS-3185 Caching of serialized federated responses Closes #1611.

          Commit 19e656e09eda8310cc9c31f67a73c6496f7ff27b in systemds's branch refs/heads/main from ywcb00
          [ https://gitbox.apache.org/repos/asf?p=systemds.git;h=19e656e09e ]

          SYSTEMDS-3185 Lineage trace federated broadcast slices

          This patch addresses the problem of differentiating between different slices
          from the same broadcast data object. The current logic identifies broadcast
          slices by its original data object, which leads to incorrect reuse of two
          different slices from the same original file. This patch manually creates
          a rightindex lineage trace for each slice to uniquely identify each slice.

          Closes #1574

          jira-bot ASF subversion and git services added a comment - Commit 19e656e09eda8310cc9c31f67a73c6496f7ff27b in systemds's branch refs/heads/main from ywcb00 [ https://gitbox.apache.org/repos/asf?p=systemds.git;h=19e656e09e ] SYSTEMDS-3185 Lineage trace federated broadcast slices This patch addresses the problem of differentiating between different slices from the same broadcast data object. The current logic identifies broadcast slices by its original data object, which leads to incorrect reuse of two different slices from the same original file. This patch manually creates a rightindex lineage trace for each slice to uniquely identify each slice. Closes #1574

          Commit 33dca00323cb611cc2028894e734fbedbcd9bf67 in systemds's branch refs/heads/main from ywcb00
          [ https://gitbox.apache.org/repos/asf?p=systemds.git;h=33dca00 ]

          SYSTEMDS-3185 Pair matrix objects with lineage traces for federated broadcast

          This patch introduces the MatrixLineagePair as a wrapper around a MatrixObject with
          its corresponding LineageItem in order to nicely couple the information of the lineage
          with a matrix until we broadcast it to the federated worker. The lineage trace is then
          included inside the PUT request of the matrix and transferred to the federated worker.
          To do this, this PR includes also a refactoring of the federated instructions which
          changes the MatrixObjects into MatrixLineagePairs whenever they are broadcast.

          Closes #1559

          jira-bot ASF subversion and git services added a comment - Commit 33dca00323cb611cc2028894e734fbedbcd9bf67 in systemds's branch refs/heads/main from ywcb00 [ https://gitbox.apache.org/repos/asf?p=systemds.git;h=33dca00 ] SYSTEMDS-3185 Pair matrix objects with lineage traces for federated broadcast This patch introduces the MatrixLineagePair as a wrapper around a MatrixObject with its corresponding LineageItem in order to nicely couple the information of the lineage with a matrix until we broadcast it to the federated worker. The lineage trace is then included inside the PUT request of the matrix and transferred to the federated worker. To do this, this PR includes also a refactoring of the federated instructions which changes the MatrixObjects into MatrixLineagePairs whenever they are broadcast. Closes #1559

          Commit 36e84d5fc020a150dc7d9952b24c652b453e4b13 in systemds's branch refs/heads/main from ywcb00
          [ https://gitbox.apache.org/repos/asf?p=systemds.git;h=36e84d5 ]

          SYSTEMDS-3185 Transfer lineage traces to federated workers

          This patch introduces the mechanics for tranferring the lineage trace
          of a data object to the federated worker.
          For now, we are including only the lineage trace of matrices which come
          from the datagen operation (e.g. rand()), as they have the respective
          lineage item set in their CacheableData objects.

          Closes #1544

          jira-bot ASF subversion and git services added a comment - Commit 36e84d5fc020a150dc7d9952b24c652b453e4b13 in systemds's branch refs/heads/main from ywcb00 [ https://gitbox.apache.org/repos/asf?p=systemds.git;h=36e84d5 ] SYSTEMDS-3185 Transfer lineage traces to federated workers This patch introduces the mechanics for tranferring the lineage trace of a data object to the federated worker. For now, we are including only the lineage trace of matrices which come from the datagen operation (e.g. rand()), as they have the respective lineage item set in their CacheableData objects. Closes #1544

          Commit de7d9c3caf451c3c587636023f76d0b6f1fadf6f in systemds's branch refs/heads/main from ywcb00
          [ https://gitbox.apache.org/repos/asf?p=systemds.git;h=de7d9c3 ]

          SYSTEMDS-3185 Handling of multi-threading in federated workers

          Closes #1535.

          jira-bot ASF subversion and git services added a comment - Commit de7d9c3caf451c3c587636023f76d0b6f1fadf6f in systemds's branch refs/heads/main from ywcb00 [ https://gitbox.apache.org/repos/asf?p=systemds.git;h=de7d9c3 ] SYSTEMDS-3185 Handling of multi-threading in federated workers Closes #1535.

          Commit 2a44e83aa57ed47634c48958c36e34ea2d5eeae5 in systemds's branch refs/heads/main from ywcb00
          [ https://gitbox.apache.org/repos/asf?p=systemds.git;h=2a44e83 ]

          SYSTEMDS-3185 Federated Read Reuse - fix computetime and robustness

          This patch adds a logic to lookup both the lineage and read cache
          if not available in the lineage cache. In addition, this patch
          fixes the recording of compute time for reading from disk.

          Closes #1542.

          jira-bot ASF subversion and git services added a comment - Commit 2a44e83aa57ed47634c48958c36e34ea2d5eeae5 in systemds's branch refs/heads/main from ywcb00 [ https://gitbox.apache.org/repos/asf?p=systemds.git;h=2a44e83 ] SYSTEMDS-3185 Federated Read Reuse - fix computetime and robustness This patch adds a logic to lookup both the lineage and read cache if not available in the lineage cache. In addition, this patch fixes the recording of compute time for reading from disk. Closes #1542.

          Commit 34444e88ac3163c1eb72a2012d1426378fd67817 in systemds's branch refs/heads/main from ywcb00
          [ https://gitbox.apache.org/repos/asf?p=systemds.git;h=34444e8 ]

          SYSTEMDS-3185 Lineage-based reuse of federated reads

          This patch adds lineage-based reuse of federated reads on the workers.
          We fall back to the read cache if lineage-based reuse is globally disabled.

          Closes #1522
          Closes #1540

          jira-bot ASF subversion and git services added a comment - Commit 34444e88ac3163c1eb72a2012d1426378fd67817 in systemds's branch refs/heads/main from ywcb00 [ https://gitbox.apache.org/repos/asf?p=systemds.git;h=34444e8 ] SYSTEMDS-3185 Lineage-based reuse of federated reads This patch adds lineage-based reuse of federated reads on the workers. We fall back to the read cache if lineage-based reuse is globally disabled. Closes #1522 Closes #1540

          Commit e62034d5f017d24e2d616fb7abd44b5d187bb58b in systemds's branch refs/heads/main from ywcb00
          [ https://gitbox.apache.org/repos/asf?p=systemds.git;h=e62034d ]

          SYSTEMDS-3185 Read cache/event loop for multi-tenant federated learning

          • Parallel event loops for multiple tenants
          • Read cache for reuse across tenants
          • Fix several federated instructions (missing wait)

          Closes #1521.

          jira-bot ASF subversion and git services added a comment - Commit e62034d5f017d24e2d616fb7abd44b5d187bb58b in systemds's branch refs/heads/main from ywcb00 [ https://gitbox.apache.org/repos/asf?p=systemds.git;h=e62034d ] SYSTEMDS-3185 Read cache/event loop for multi-tenant federated learning Parallel event loops for multiple tenants Read cache for reuse across tenants Fix several federated instructions (missing wait) Closes #1521.

          Commit 1291aa9ee9794101973961d33da2d91415229dcb in systemds's branch refs/heads/main from ywcb00
          [ https://gitbox.apache.org/repos/asf?p=systemds.git;h=1291aa9 ]

          SYSTEMDS-3185 Refactor Statistics module

          This patch refactors the Statistics class by moving related
          statistics into their own classes in a new package, utils/stats.
          Moreover, this adds statistics for federated lookup table access
          and splits the federated PUT request count into separate counts
          for the individual object types.

          Closes #1487
          Closes #1519

          jira-bot ASF subversion and git services added a comment - Commit 1291aa9ee9794101973961d33da2d91415229dcb in systemds's branch refs/heads/main from ywcb00 [ https://gitbox.apache.org/repos/asf?p=systemds.git;h=1291aa9 ] SYSTEMDS-3185 Refactor Statistics module This patch refactors the Statistics class by moving related statistics into their own classes in a new package, utils/stats. Moreover, this adds statistics for federated lookup table access and splits the federated PUT request count into separate counts for the individual object types. Closes #1487 Closes #1519

          Commit 5cc523971854cdf4f22e6199987a86e213fae4e2 in systemds's branch refs/heads/main from ywcb00
          [ https://gitbox.apache.org/repos/asf?p=systemds.git;h=5cc5239 ]

          SYSTEMDS-3185 Multi-tenant federated workers (variable isolation)

          Closes #1421.

          jira-bot ASF subversion and git services added a comment - Commit 5cc523971854cdf4f22e6199987a86e213fae4e2 in systemds's branch refs/heads/main from ywcb00 [ https://gitbox.apache.org/repos/asf?p=systemds.git;h=5cc5239 ] SYSTEMDS-3185 Multi-tenant federated workers (variable isolation) Closes #1421.

          People

            Unassigned Unassigned
            ywcb00 David Weissteiner
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: