Details
Description
The following issue happens only when running pyspark in the python interpreter, it works correctly with spark-submit.
Reading two json files containing objects with a different structure leads sometimes to the definition of wrong Rows, where the fields of a file are used for the other one.
I was able to write a sample code that reproduce this issue one out of three times; the code snippet is available at the following link, together with some (very simple) data samples:
Attachments
Issue Links
- links to