[SPARK-43221] Executor obtained error information - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: 3.1.1, 3.2.0, 3.3.0
Fix Version/s: None
Component/s: Block Manager
Labels:
- pull-request-available

Description

Spark on Yarn Cluster

When multiple executors exist on a node, and the same block exists on both executors, with some in memory and some on disk.

Probabilistically, the executor failed to obtain the block,throw Exception:

java.lang.ArrayIndexOutofBoundsException: 0

at org.apache.spark.broadcast.TorrentBroadcast.$anonfun$readBlocks$1(TorrentBroadcast.scala:183)

Next, I will replay the process of the problem occurring:

step 1:

The executor requests the driver to obtain block information(locationsAndStatusOption). The input parameters are BlockId and the host of its own node. Please note that it does not carry port information

line:1092

step 2:

On the driver side, the driver obtains all blockManagers holding the block based on the BlockId. For non remote shuffle scenarios, the driver will retrieve the first one with the blockId and blockManager from the locations

Assuming that there are two BlockManagers holding the BlockId on this node, BM-1 holds the Block and stores it in memory, and BM-2 holds the Block and stores it in disk

Assuming the returned status is of type memory and its disksize is 0

line: 852, 856

step 3:

This method will return a BlockLocationsAndStatus object. If there are BMs using disk, the disk's path information will be stored in localDirs

step 4:

When the executor obtains locationsAndStatusOption, localDirs is not empty, but status.diskSize is 0

line: 1102

step 5:

The readDiskBlockFromSameHostExecutor only determines whether the Block file exists, and then directly uses the incoming blocksize to read the byte array. If the blocksize is 0, it returns an empty byte array

Only checked if the file exists

line: 1234, 1240

Taking values from an empty array, causing an out of bounds problem

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

image-2023-04-21-00-19-58-021.png
20/Apr/23 16:20
64 kB
Qiang Yang
image-2023-04-21-00-24-22-059.png
20/Apr/23 16:24
62 kB
Qiang Yang
image-2023-04-21-00-30-41-851.png
20/Apr/23 16:30
74 kB
Qiang Yang
image-2023-04-21-00-50-10-918.png
20/Apr/23 16:50
99 kB
Qiang Yang
image-2023-04-21-00-53-20-720.png
20/Apr/23 16:53
101 kB
Qiang Yang
image-2023-04-21-00-54-11-968.png
20/Apr/23 16:54
128 kB
Qiang Yang
image-2023-04-21-00-57-29-140.png
20/Apr/23 16:57
129 kB
Qiang Yang

Issue Links

links to

[Github] Pull Request #40883 (yorksity)

GitHub Pull Request #40883

Activity

People

Assignee:: Unassigned

Reporter:: Qiang Yang

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 20/Apr/23 15:56

Updated:: 17/Jan/24 00:19

Time Tracking

Estimated:

24h

Remaining:

24h

Logged:

Not Specified