[SPARK-21402] Fix java array of structs deserialization - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 2.1.1
Fix Version/s: 2.2.3, 2.3.3, 2.4.0
Component/s: SQL
Labels:
None
Environment:

mac os
spark 2.1.1
Using Scala version 2.11.8, Java HotSpot(TM) 64-Bit Server VM, 1.8.0_121

Description

I have the following schema in a dataset -

root

– userId: string (nullable = true)
– data: map (nullable = true)
	– key: string
	– value: struct (valueContainsNull = true)
		– startTime: long (nullable = true)
		– endTime: long (nullable = true)
– offset: long (nullable = true)

And I have the following classes (+ setter and getters which I omitted for simplicity) -

public class MyClass {

    private String userId;

    private Map<String, MyDTO> data;

    private Long offset;
 }

public class MyDTO {

    private long startTime;
    private long endTime;

}

I collect the result the following way -

        Encoder<MyClass> myClassEncoder = Encoders.bean(MyClass.class);
        Dataset<MyClass> results = raw_df.as(myClassEncoder);
        List<MyClass> lst = results.collectAsList();

I do several calculations to get the result I want and the result is correct all through the way before I collect it.
This is the result for -

results.select(results.col("data").getField("2017-07-01").getField("startTime")).show(false);

data[2017-07-01].startTime

data[2017-07-01].endTime

-----------------------------------------+

1498854000

1498870800

This is the result after collecting the reuslts for -

MyClass userData = results.collectAsList().get(0);
MyDTO userDTO = userData.getData().get("2017-07-01");
System.out.println("userDTO startTime: " + userDTO.getStartTime());
System.out.println("userDTO endTime: " + userDTO.getEndTime());

–
data startTime: 1498870800
data endTime: 1498854000

I tend to believe it is a spark issue. Would love any suggestions on how to bypass it.

Attachments

Issue Links

is cloned by

SPARK-25772 Java encoders - switch fields on collectAsList

Resolved

is duplicated by

SPARK-21747 Java encoders - switch fields on collectAsList

Resolved

links to

[Github] Pull Request #22708 (vofque)

[Github] Pull Request #22745 (vofque)

[Github] Pull Request #22767 (vofque)

[Github] Pull Request #22768 (vofque)

(4 links to)

Activity

People

Assignee:: Vladimir Kuriatkov

Reporter:: Tom

Votes:: 1 Vote for this issue

Watchers:: 9 Start watching this issue

Dates

Created:: 13/Jul/17 14:12

Updated:: 19/Oct/18 17:24

Resolved:: 18/Oct/18 22:12