Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Fixed
-
None
-
None
-
None
Description
On Kmeans, the fusion heuristic fnr is failing with index of of bounds on distributed (i.e., spark) codegen row operations. The root cause is misplaced meta data management, that implicitly assumes that the first side input is broadcast, which fails if this side input is also large and taken as an additional rdd input. Specifically, its failing when executing the following operator:
public final class TMP64 extends SpoofRowwise { public TMP64() { super(RowType.COL_AGG_B1_T, -1, false, 1); } protected void genexec(double[] a, int ai, SideInput[] b, double[] scalars, double[] c, int len, int rix) { LibSpoofPrimitives.vectOuterMultAdd(a, b[0].values(rix), c, ai, b[0].pos(rix), 0, len, b[0].clen); } protected void genexec(double[] avals, int[] aix, int ai, SideInput[] b, double[] scalars, double[] c, int alen, int len, int rix) { LibSpoofPrimitives.vectOuterMultAdd(avals, b[0].values(rix), c, aix, ai, b[0].pos(rix), 0, alen, len, b[0].clen); } }