Description
Currently if a table is used in join operation we rely on Metastore returned size to calculate if we can convert the operation to Broadcast join. This optimization only kicks in for table's that have the statistics available in metastore. Hive generally rolls over to HDFS if the statistics are not available directly from metastore and this seems like a reasonable choice to adopt given the optimization benefit of using broadcast joins.
Attachments
Issue Links
- Blocked
-
SPARK-20475 Whether use "broadcast join" depends on hive configuration
-
- Closed
-
- links to