[HOP-3984] Not getting complete data in output while running on spark engine - ASF JIRA

XML

Word

Printable

JSON

While running a simple pipeline having txt input and txt output on spark, the pipeline is not able to write complete output to the "output file".

How to reproduce:

1) Create a simple pipeline having 2 transforms text file input and text file output

2) Use any simple csv/txt file in Text input file transform

3) Write the data to a text/csv file using text file output transform

If we are reading x lines in #2, then we will get y lines in #3 where x > y.

As we don't have any intermediate transforms in this pipeline, there should not be any change in the output i.e. x should equal to y.

The output still won't match if we use zipped input or zipped output or use any other option in input/output/execution window.

Attaching:

2) Pipeline with different scenarios - hop_pipeline_multiple_scenarios.hpl,

hop_pipeline_multiple_scenarios.hpl
08/Jun/22 12:46
7 kB
Utkarsh Singhal
hop_pipeline_simple.hpl
08/Jun/22 12:46
33 kB
Utkarsh Singhal
image-2022-06-08-18-08-06-655.png
08/Jun/22 12:38
6 kB
Utkarsh Singhal
names.txt
08/Jun/22 12:44
38 kB
Utkarsh Singhal
names.zip
08/Jun/22 12:44
0.3 kB
Utkarsh Singhal
simple_mapping_output_2_20220608_172947.txt
08/Jun/22 12:48
59 kB
Utkarsh Singhal
simple_mapping_output.txt_20220608_122959.txt
08/Jun/22 12:50
97 kB
Utkarsh Singhal