Details
-
Improvement
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
-
None
Description
Above all, I'm sorry for submitting a big patch. This patch modifies and refactors broadly global planning, logical planning, and physical planning parts. It was hard to separate this issue into smaller issues.
Especially, this patch primarily rewrites GlobalPlanner and MasterPlan (global plan data structure) as follows:
- Removed GlobalPlanOptimizer
- Added DirectedGraph interface, SimpleDirectedGraph concret class, and a visitor class to visit a graph in post-order traverse way.
- Improved MasterPlan by using new graph API
- query block graphs and an execution block graph are represented by SimpleDirectedGraph.
- Now, we can traverse above graphs easily by using graph APIs.
- Added DataChannel class to represent a data flow between execution blocks.
- MasterPlan.toString() prints a text graph to represent relationships among execution blocks and a distributed plan.
- Add more sophisticated explain feature for a distributed plan and logical plan. It is very useful for plan debugging.
- Now, the limit operator is pushed down to child execution block.
- So, the intermediate data volume of a sort query with limit is reduced significantly.
- TableSubQuery (inline view) is supported. It follows SQL standards. So, you can do a query as follows:
SELECT * FROM ( SELECT l_orderkey, l_partkey, url FROM ( SELECT l_orderkey, l_partkey, CASE WHEN l_partkey IS NOT NULL THEN '' WHEN l_orderkey = 1 THEN '1' ELSE '2' END AS url FROM lineitem ) res1 JOIN ( SELECT * FROM part ) res2 ON l_partkey = p_partkey ) result
In addition, I've refactored as follows:
- Column has a qualifier name.
- Improved Schema to deal with qualified column names
- When a TableDesc instance is retrieved, it is forced to have qualifier columns.
- Fixed
TAJO-162bug. - Lots of trivial improvement and refactors.