[SPARK-22526] Document closing of PortableDataInputStream in binaryFiles - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Minor
Resolution: Won't Fix
Affects Version/s: 2.2.0
Fix Version/s: None
Component/s: Documentation, Spark Core
Labels:
None

Description

Hi,

I am using Spark 2.2.0(recent version) to read binary files from S3. I use sc.binaryfiles to read the files.

It is working fine until some 100 file read but later it get hangs indefinitely from 5 up to 40 mins like Avro file read issue(it was fixed in the later releases)

I tried setting the fs.s3a.connection.maximum to some maximum values but didn't help.

And finally i ended up using the spark speculation parameter set which is again didnt help much.

One thing Which I observed is that it is not closing the connection after every read of binary files from the S3.

example :- sc.binaryFiles("s3a://test/test123.zip")

Please look into this major issue!

Attachments

Issue Links

is related to

HADOOP-14621 S3A client raising ConnectionPoolTimeoutException in test

Resolved

relates to

HADOOP-15071 s3a troubleshooting docs to add a couple more failure modes

Resolved

Activity

People

Assignee:: Unassigned

Reporter:: mohamed imran

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 15/Nov/17 11:30

Updated:: 05/Dec/17 14:15

Resolved:: 05/Dec/17 14:15

Time Tracking

Estimated:

168h

Remaining:

168h

Logged:

Not Specified