[SPARK-6677] pyspark.sql nondeterministic issue with row fields - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 1.3.0
Fix Version/s: 1.3.1, 1.4.0
Component/s: PySpark
Labels:
- pyspark
- row
- sql
Environment:

spark version: spark-1.3.0-bin-hadoop2.4
python version: Python 2.7.6
operating system: MacOS, x86_64 x86_64 x86_64 GNU/Linux

Description

The following issue happens only when running pyspark in the python interpreter, it works correctly with spark-submit.

Reading two json files containing objects with a different structure leads sometimes to the definition of wrong Rows, where the fields of a file are used for the other one.

I was able to write a sample code that reproduce this issue one out of three times; the code snippet is available at the following link, together with some (very simple) data samples:

https://gist.github.com/armisael/e08bb4567d0a11efe2db

Attachments

Issue Links

links to

[Github] Pull Request #5445 (davies)

Activity

People

Assignee:: Davies Liu

Reporter:: Stefano Parmesan

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 02/Apr/15 10:25

Updated:: 12/Apr/15 13:01

Resolved:: 12/Apr/15 05:34