[PARQUET-1172] Question on pig loader read parquet file - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: 1.9.0, 1.9.1
Fix Version/s: None
Component/s: parquet-mr, parquet-pig
Labels:
None

Description

When I use spark save parquet file, schema like this

optional group attref (LIST) {
       repeated group list {
         optional group element {
           optional binary nid (UTF8);
           optional binary nss (UTF8);
         }
       }
     }

And then use parquet-pig-bundle to read this file, the read function can work, but when i need to access "nid" it have some problem

If I read other file save by pig-storer, and need nid list, pig command is:

 
B = foreach A generate value.addr.clientIp_bag.clientIp, value.guid , value.attref.nid;

but read spark save version I need use this:

B = foreach M generate value.addr.clientIp, value.guid , flatten(value.attref);
C = foreach B generate clientIp, guid, attref::element.nid;

and this command will flatten column

My question is pig loader have some problem when loading parquet file(save by spark)

Attachments

Activity

People

Assignee:: Unassigned

Reporter:: abel_ke

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 07/Dec/17 03:45

Updated:: 23/Jun/24 03:29