Details
-
Bug
-
Status: Patch Available
-
Major
-
Resolution: Unresolved
-
0.12.0, 0.11.1, 0.12.1
-
None
-
None
-
Patch Available
Description
When passing multiple column keys for Correlation analysis, if coefficient value of one of the combinations is NaN, then the value for all other combinations is not computed.
Pearson Co-efficient value is NaN if all values for a given column are the same.
Example:
A = LOAD 'myData' USING org.apache.hcatalog.pig.HCatLoader();
B = group A all;
c = foreach B generate group, FLATTEN(COR((bag
) A.col_2, (bag
{tuple(double)}) A.col_3, (bag{tuple(double)}) A.col_4));
If the value of pearson coefficient for col_1 and col_2 is NaN, then value of co-efficients for all combinations is NaN
This is happening because of 'return null' statement in catch block on lines 157 and 235 in file org.apache.pig.builtin.COR.java
If the catch block is removed, then the correlation analysis would continue for the remaining columns. (ApachePig 0.12.0)