Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Fixed
-
None
-
None
-
None
Description
In PIG-3835, was trying to do use vertex groups for unions. Union followed by store works fine. But when trying to do groupby,
A = LOAD '/tmp/data' AS (f1:int,f2:int); B = LOAD '/tmp/data2' AS (f1:int,f2:int); C = UNION onschema A,B; D = GROUP C by f1; E = FOREACH D GENERATE group, SUM(C.f2); store E into '/tmp/tezout' using PigStorage();
ConcatenatedMergedKeyValuesInput on the reduce, had only grouped records within each input and not across all inputs.
i.e If A had records
a 1
b 1
b 2
and B
a 2
a 3
b 3
The records from ConcatenatedMergedKeyValuesInput of A and B were
a
, b
{1,2}, a
{2,3}, b
{3}while I am expecting a
{1,2,3}, b {1,2,3}