Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
1.9.0, 1.9.1
-
None
-
None
Description
When I use spark save parquet file, schema like this
optional group attref (LIST) { repeated group list { optional group element { optional binary nid (UTF8); optional binary nss (UTF8); } } }
And then use parquet-pig-bundle to read this file, the read function can work, but when i need to access "nid" it have some problem
If I read other file save by pig-storer, and need nid list, pig command is:
B = foreach A generate value.addr.clientIp_bag.clientIp, value.guid , value.attref.nid;
but read spark save version I need use this:
B = foreach M generate value.addr.clientIp, value.guid , flatten(value.attref); C = foreach B generate clientIp, guid, attref::element.nid;
and this command will flatten column
My question is pig loader have some problem when loading parquet file(save by spark)