Details
-
Sub-task
-
Status: Closed
-
Major
-
Resolution: Fixed
-
None
-
None
-
None
Description
Current unpartitioned cartesian product has a few limitations
1. parallelism can be not enough in case of large split and small # src task
2. parallelism can be too much in in case of large # src task
3. workload is not ideally distributed across the worker. Even with auto grouping, grouping by size may not be accurate because same size can means different #record and different cartesian product ops.
Attachments
Attachments
Issue Links
- breaks
-
TEZ-3737 FairCartesianProductVertexMananger used incorrect #partition
- Closed
-
TEZ-3739 Fair CartesianProduct doesn't works well with huge difference in output size
- Closed
- is blocked by
-
TEZ-3697 Adding #output_record in vertex manager event payload
- Closed
- is related to
-
TEZ-3819 Round robin partitioner make fair cartesian product not fault tolerant
- Patch Available