Details

    • Task
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • None

    Attachments

      Activity

        Commit 827438b953ed2bc6f6f4cc30a34df50231bea050 in systemds's branch refs/heads/main from Matthias Boehm
        [ https://gitbox.apache.org/repos/asf?p=systemds.git;h=827438b953 ]

        SYSTEMDS-3696 Fix incSliceLine flag for pruning strategies

        Recent experiments revealed that even with disabled pruning strategies
        incSliceLine was still faster than sliceLine on some datasets because
        the reevaluated top-K set was passed to the current top-K set from the
        beginning and thus used for additional score pruning. We now prevent
        this if score pruning is disabled.

        jira-bot ASF subversion and git services added a comment - Commit 827438b953ed2bc6f6f4cc30a34df50231bea050 in systemds's branch refs/heads/main from Matthias Boehm [ https://gitbox.apache.org/repos/asf?p=systemds.git;h=827438b953 ] SYSTEMDS-3696 Fix incSliceLine flag for pruning strategies Recent experiments revealed that even with disabled pruning strategies incSliceLine was still faster than sliceLine on some datasets because the reevaluated top-K set was passed to the current top-K set from the beginning and thus used for additional score pruning. We now prevent this if score pruning is disabled.

        Commit 5fc93b607fa9b3b2d5dd359007721f79551a09d4 in systemds's branch refs/heads/main from Frederic Zoepffel
        [ https://gitbox.apache.org/repos/asf?p=systemds.git;h=5fc93b607f ]

        SYSTEMDS-3696 Extended incremental SliceLine state handling

        Closes #2116.

        jira-bot ASF subversion and git services added a comment - Commit 5fc93b607fa9b3b2d5dd359007721f79551a09d4 in systemds's branch refs/heads/main from Frederic Zoepffel [ https://gitbox.apache.org/repos/asf?p=systemds.git;h=5fc93b607f ] SYSTEMDS-3696 Extended incremental SliceLine state handling Closes #2116.

        Commit 3a73b77e4187d51ded0d0a5b81d32d3a1f407156 in systemds's branch refs/heads/main from Frederic Zoepffel
        [ https://gitbox.apache.org/repos/asf?p=systemds.git;h=3a73b77e41 ]

        SYSTEMDS-3696 Minor robustness fix and pruning flags

        Closes #2107.

        jira-bot ASF subversion and git services added a comment - Commit 3a73b77e4187d51ded0d0a5b81d32d3a1f407156 in systemds's branch refs/heads/main from Frederic Zoepffel [ https://gitbox.apache.org/repos/asf?p=systemds.git;h=3a73b77e41 ] SYSTEMDS-3696 Minor robustness fix and pruning flags Closes #2107.

        Commit 726d21d08aa417764123221e2f5ae95ff92bb4f9 in systemds's branch refs/heads/main from Matthias Boehm
        [ https://gitbox.apache.org/repos/asf?p=systemds.git;h=726d21d08a ]

        SYSTEMDS-3696 Additional pruning strategy for incremental slice line

        This patch adds a very effective pruning strategy which yields up to
        two orders of magnitude runtime improvements on Adult, Covtype, KDD98,
        and USCenus. However, this strategy only gives high-probability
        guarantees. In detail, we evaluate previously evaluated slices by
        adding the contribution of added and removed tuples in order to
        determine feature-wise high-probability upper bound scores which are
        in turn used to eliminate basic (single-feature) slices early on.
        Due to edge cases that might be missed, this strategy should not be
        applied by default (even though the tests pass), which I will do
        when handling #2107 because it also touches the pruning selector.

        jira-bot ASF subversion and git services added a comment - Commit 726d21d08aa417764123221e2f5ae95ff92bb4f9 in systemds's branch refs/heads/main from Matthias Boehm [ https://gitbox.apache.org/repos/asf?p=systemds.git;h=726d21d08a ] SYSTEMDS-3696 Additional pruning strategy for incremental slice line This patch adds a very effective pruning strategy which yields up to two orders of magnitude runtime improvements on Adult, Covtype, KDD98, and USCenus. However, this strategy only gives high-probability guarantees. In detail, we evaluate previously evaluated slices by adding the contribution of added and removed tuples in order to determine feature-wise high-probability upper bound scores which are in turn used to eliminate basic (single-feature) slices early on. Due to edge cases that might be missed, this strategy should not be applied by default (even though the tests pass), which I will do when handling #2107 because it also touches the pruning selector.

        Commit a973b1107567cca27a4c23ec3e230e17f00f46e7 in systemds's branch refs/heads/main from Frederic Zoepffel
        [ https://gitbox.apache.org/repos/asf?p=systemds.git;h=a973b11075 ]

        SYSTEMDS-3696 Fix edge cases incremental sliceline

        Closes #2106.

        jira-bot ASF subversion and git services added a comment - Commit a973b1107567cca27a4c23ec3e230e17f00f46e7 in systemds's branch refs/heads/main from Frederic Zoepffel [ https://gitbox.apache.org/repos/asf?p=systemds.git;h=a973b11075 ] SYSTEMDS-3696 Fix edge cases incremental sliceline Closes #2106.

        Commit 95cfb76fee57e92d20c94b26b445d91819dfc5ee in systemds's branch refs/heads/main from Matthias Boehm
        [ https://gitbox.apache.org/repos/asf?p=systemds.git;h=95cfb76fee ]

        SYSTEMDS-3696 Fix incremental SliceLine naming conflicts in namespace

        This patch fixes the SliceLine builtin in order to allow joint use of
        SliceLine and incSliceLine without any naming conflicts in the
        .builtin namespace.

        jira-bot ASF subversion and git services added a comment - Commit 95cfb76fee57e92d20c94b26b445d91819dfc5ee in systemds's branch refs/heads/main from Matthias Boehm [ https://gitbox.apache.org/repos/asf?p=systemds.git;h=95cfb76fee ] SYSTEMDS-3696 Fix incremental SliceLine naming conflicts in namespace This patch fixes the SliceLine builtin in order to allow joint use of SliceLine and incSliceLine without any naming conflicts in the .builtin namespace.

        Commit c1e8500e0704e0f254799f0425ee50006920b7b3 in systemds's branch refs/heads/main from Matthias Boehm
        [ https://gitbox.apache.org/repos/asf?p=systemds.git;h=c1e8500e07 ]

        SYSTEMDS-3696 Improved incremental slice line (pruning unchanged)

        This patch improves the pruning of unchanged slices below min-support
        with a more efficient selection and matching against enumerated slices.
        Now, on Adult the first incSliceLine runs in 27s (similar to sliceLine)
        but the second incSliceLine with few additional tuples runs in 3s.

        jira-bot ASF subversion and git services added a comment - Commit c1e8500e0704e0f254799f0425ee50006920b7b3 in systemds's branch refs/heads/main from Matthias Boehm [ https://gitbox.apache.org/repos/asf?p=systemds.git;h=c1e8500e07 ] SYSTEMDS-3696 Improved incremental slice line (pruning unchanged) This patch improves the pruning of unchanged slices below min-support with a more efficient selection and matching against enumerated slices. Now, on Adult the first incSliceLine runs in 27s (similar to sliceLine) but the second incSliceLine with few additional tuples runs in 3s.

        Commit 472e69fb2ef7d0b30662d1ca313c1b25628f1a94 in systemds's branch refs/heads/main from Matthias Boehm
        [ https://gitbox.apache.org/repos/asf?p=systemds.git;h=472e69fb2e ]

        SYSTEMDS-3696 Additional pruning in incremental slice line

        Besides additional cleanups and smaller improvements, this patch adds
        a new pruning strategy that computes for unchanged slice the
        maximal reachable scores from previous runs, scales them according to
        the new datasize and average errors and utilizes these scores to
        prune all features who's maxsc is smaller than 0 or the scores of the
        previous top-k set evaluated on the new data.

        jira-bot ASF subversion and git services added a comment - Commit 472e69fb2ef7d0b30662d1ca313c1b25628f1a94 in systemds's branch refs/heads/main from Matthias Boehm [ https://gitbox.apache.org/repos/asf?p=systemds.git;h=472e69fb2e ] SYSTEMDS-3696 Additional pruning in incremental slice line Besides additional cleanups and smaller improvements, this patch adds a new pruning strategy that computes for unchanged slice the maximal reachable scores from previous runs, scales them according to the new datasize and average errors and utilizes these scores to prune all features who's maxsc is smaller than 0 or the scores of the previous top-k set evaluated on the new data.

        Commit 8b5d4cc2419b56877f0028e2d451c10d83327fdd in systemds's branch refs/heads/main from Matthias Boehm
        [ https://gitbox.apache.org/repos/asf?p=systemds.git;h=8b5d4cc241 ]

        SYSTEMDS-3696 Fix incremental slice line (unchanged pruning)

        jira-bot ASF subversion and git services added a comment - Commit 8b5d4cc2419b56877f0028e2d451c10d83327fdd in systemds's branch refs/heads/main from Matthias Boehm [ https://gitbox.apache.org/repos/asf?p=systemds.git;h=8b5d4cc241 ] SYSTEMDS-3696 Fix incremental slice line (unchanged pruning)

        Commit 254d680e465b3ccdf247878ef5f665ad12828daa in systemds's branch refs/heads/main from Matthias Boehm
        [ https://gitbox.apache.org/repos/asf?p=systemds.git;h=254d680e46 ]

        SYSTEMDS-3696 Improve incremental slice line (cleanup, robustness)

        • robust top-K maintenance for continuous score pruning
          w/o special cases for minsc handing
        • cleanup pruning strategies of basic input slices
        • robustness for -Inf in previous top-K evaluation
        • various vectorization of individual code snippets
        • improved error handling (via stop)
        jira-bot ASF subversion and git services added a comment - Commit 254d680e465b3ccdf247878ef5f665ad12828daa in systemds's branch refs/heads/main from Matthias Boehm [ https://gitbox.apache.org/repos/asf?p=systemds.git;h=254d680e46 ] SYSTEMDS-3696 Improve incremental slice line (cleanup, robustness) robust top-K maintenance for continuous score pruning w/o special cases for minsc handing cleanup pruning strategies of basic input slices robustness for -Inf in previous top-K evaluation various vectorization of individual code snippets improved error handling (via stop)

        Commit 5283544289b708e32756d5b145c01093fa032c4c in systemds's branch refs/heads/main from Frederic Zoepffel
        [ https://gitbox.apache.org/repos/asf?p=systemds.git;h=5283544289 ]

        SYSTEMDS-3696 Extended incremental slice line (pruning selector)

        Closes #2098.

        jira-bot ASF subversion and git services added a comment - Commit 5283544289b708e32756d5b145c01093fa032c4c in systemds's branch refs/heads/main from Frederic Zoepffel [ https://gitbox.apache.org/repos/asf?p=systemds.git;h=5283544289 ] SYSTEMDS-3696 Extended incremental slice line (pruning selector) Closes #2098.

        Commit f4e53ba17a4147ecfacb10b0c905f09397d7545b in systemds's branch refs/heads/main from Matthias Boehm
        [ https://gitbox.apache.org/repos/asf?p=systemds.git;h=f4e53ba17a ]

        SYSTEMDS-3696 Performance improvements incremental slice finding

        This patch is a performance fix-pack for incremental SliceLine, which
        improved its runtime from 90.4 to 52.2s on a particular scenario with
        the Adult dataset. In detail, the modifications include:

        • vectorized one-hot encoding: O(m^2*n^2) -> O(m*n)
        • vectorized scoring of previous top-k set
        • vectorized pruning of unchanged slices
        • vectorized removal of deleted tuples: O(n^2) -> O

        Furthermore, this patch also cleans up the wrong formatting (spaces
        instead of tabs) of the incremental slice finder tests.

        jira-bot ASF subversion and git services added a comment - Commit f4e53ba17a4147ecfacb10b0c905f09397d7545b in systemds's branch refs/heads/main from Matthias Boehm [ https://gitbox.apache.org/repos/asf?p=systemds.git;h=f4e53ba17a ] SYSTEMDS-3696 Performance improvements incremental slice finding This patch is a performance fix-pack for incremental SliceLine, which improved its runtime from 90.4 to 52.2s on a particular scenario with the Adult dataset. In detail, the modifications include: vectorized one-hot encoding: O(m^2*n^2) -> O(m*n) vectorized scoring of previous top-k set vectorized pruning of unchanged slices vectorized removal of deleted tuples: O(n^2) -> O Furthermore, this patch also cleans up the wrong formatting (spaces instead of tabs) of the incremental slice finder tests.

        Commit 4b2a3ca7823599f19be23aa41038e658cdd0ff4e in systemds's branch refs/heads/main from Frederic Zoepffel
        [ https://gitbox.apache.org/repos/asf?p=systemds.git;h=4b2a3ca782 ]

        SYSTEMDS-3696 Improved incremental slice-line buitin

        Closes #2063.

        jira-bot ASF subversion and git services added a comment - Commit 4b2a3ca7823599f19be23aa41038e658cdd0ff4e in systemds's branch refs/heads/main from Frederic Zoepffel [ https://gitbox.apache.org/repos/asf?p=systemds.git;h=4b2a3ca782 ] SYSTEMDS-3696 Improved incremental slice-line buitin Closes #2063.

        Commit 54d0a65145aa43338da4df55e75e6e1fa598e8e3 in systemds's branch refs/heads/main from Frederic Zoepffel
        [ https://gitbox.apache.org/repos/asf?p=systemds.git;h=54d0a65145 ]

        SYSTEMDS-3696 Improved incremental SliceLine (previous stats)

        Closes #2039.

        jira-bot ASF subversion and git services added a comment - Commit 54d0a65145aa43338da4df55e75e6e1fa598e8e3 in systemds's branch refs/heads/main from Frederic Zoepffel [ https://gitbox.apache.org/repos/asf?p=systemds.git;h=54d0a65145 ] SYSTEMDS-3696 Improved incremental SliceLine (previous stats) Closes #2039.

        Commit 9e99f3c4c3bec42299fa5e48a0cb3bc3aea264be in systemds's branch refs/heads/main from Matthias Boehm
        [ https://gitbox.apache.org/repos/asf?p=systemds.git;h=9e99f3c4c3 ]

        SYSTEMDS-3696 New sliceLineDebug built-in function for usability

        This patch adds a new sliceLineDebug function to present the top-k
        worst-slides returned from sliceLine (slicefinder) in a human
        readable format. This is the output for the Salaries dataset:

        sliceLineDebug:
        – Slice #1: score=0.4041683676825298, size=248.0
        ---- avg error=6.558681888351787E8, max error=8.524558818262574E9
        ---- predicate: "rank" = "Prof" AND "sex" = "Male"
        – Slice #2: score=0.3731763935666855, size=42.0
        ---- avg error=8.271958572009121E8, max error=4.553584116646141E9
        ---- predicate: "rank" = "Prof" AND "yrs.since.phd" = 31.25
        – Slice #3: score=0.3675193573989536, size=125.0
        ---- avg error=6.758211389786526E8, max error=8.524558818262574E9
        ---- predicate: "rank" = "Prof" AND "discipline" = "B" AND "sex" =
        "Male"
        – Slice #4: score=0.35652331744984933, size=266.0
        ---- avg error=6.307265846260264E8, max error=8.524558818262574E9
        ---- predicate: "rank" = "Prof"

        jira-bot ASF subversion and git services added a comment - Commit 9e99f3c4c3bec42299fa5e48a0cb3bc3aea264be in systemds's branch refs/heads/main from Matthias Boehm [ https://gitbox.apache.org/repos/asf?p=systemds.git;h=9e99f3c4c3 ] SYSTEMDS-3696 New sliceLineDebug built-in function for usability This patch adds a new sliceLineDebug function to present the top-k worst-slides returned from sliceLine (slicefinder) in a human readable format. This is the output for the Salaries dataset: sliceLineDebug: – Slice #1: score=0.4041683676825298, size=248.0 ---- avg error=6.558681888351787E8, max error=8.524558818262574E9 ---- predicate: "rank" = "Prof" AND "sex" = "Male" – Slice #2: score=0.3731763935666855, size=42.0 ---- avg error=8.271958572009121E8, max error=4.553584116646141E9 ---- predicate: "rank" = "Prof" AND "yrs.since.phd" = 31.25 – Slice #3: score=0.3675193573989536, size=125.0 ---- avg error=6.758211389786526E8, max error=8.524558818262574E9 ---- predicate: "rank" = "Prof" AND "discipline" = "B" AND "sex" = "Male" – Slice #4: score=0.35652331744984933, size=266.0 ---- avg error=6.307265846260264E8, max error=8.524558818262574E9 ---- predicate: "rank" = "Prof"

        Commit 5ec8d0c06a99cdf1250d2d85c6dbc8e43e84ea19 in systemds's branch refs/heads/main from Frederic Zoepffel
        [ https://gitbox.apache.org/repos/asf?p=systemds.git;h=5ec8d0c06a ]

        SYSTEMDS-3696 Basic incremental slice-line builtin, and tests

        Closes #2024.

        jira-bot ASF subversion and git services added a comment - Commit 5ec8d0c06a99cdf1250d2d85c6dbc8e43e84ea19 in systemds's branch refs/heads/main from Frederic Zoepffel [ https://gitbox.apache.org/repos/asf?p=systemds.git;h=5ec8d0c06a ] SYSTEMDS-3696 Basic incremental slice-line builtin, and tests Closes #2024.

        People

          Unassigned Unassigned
          christina_dionysio Christina Dionysio
          Votes:
          0 Vote for this issue
          Watchers:
          2 Start watching this issue

          Dates

            Created:
            Updated: