Uploaded image for project: 'Tajo (Retired)'
  1. Tajo (Retired)
  2. TAJO-184

Refactor GlobalPlanner and global plan data structure

    XMLWordPrintableJSON

Details

    Description

      Above all, I'm sorry for submitting a big patch. This patch modifies and refactors broadly global planning, logical planning, and physical planning parts. It was hard to separate this issue into smaller issues.

      Especially, this patch primarily rewrites GlobalPlanner and MasterPlan (global plan data structure) as follows:

      • Removed GlobalPlanOptimizer
      • Added DirectedGraph interface, SimpleDirectedGraph concret class, and a visitor class to visit a graph in post-order traverse way.
      • Improved MasterPlan by using new graph API
        • query block graphs and an execution block graph are represented by SimpleDirectedGraph.
        • Now, we can traverse above graphs easily by using graph APIs.
        • Added DataChannel class to represent a data flow between execution blocks.
      • MasterPlan.toString() prints a text graph to represent relationships among execution blocks and a distributed plan.
      • Add more sophisticated explain feature for a distributed plan and logical plan. It is very useful for plan debugging.
      • Now, the limit operator is pushed down to child execution block.
        • So, the intermediate data volume of a sort query with limit is reduced significantly.
      • TableSubQuery (inline view) is supported. It follows SQL standards. So, you can do a query as follows:
        SELECT *
        FROM
        (
            SELECT
                l_orderkey,
                l_partkey,
                url
            FROM
                (
                  SELECT
                    l_orderkey,
                    l_partkey,
                    CASE
                      WHEN
                        l_partkey IS NOT NULL THEN ''
                      WHEN l_orderkey = 1 THEN '1'
                    ELSE
                      '2'
                    END AS url
                  FROM
                    lineitem
                ) res1
                JOIN
                (
                  SELECT
                    *
                  FROM
                    part
                ) res2
                ON l_partkey = p_partkey
        ) result
        

      In addition, I've refactored as follows:

      • Column has a qualifier name.
      • Improved Schema to deal with qualified column names
      • When a TableDesc instance is retrieved, it is forced to have qualifier columns.
      • Fixed TAJO-162 bug.
      • Lots of trivial improvement and refactors.

      Attachments

        1. TAJO-184_2.patch
          445 kB
          Hyunsik Choi
        2. TAJO-184.patch
          442 kB
          Hyunsik Choi

        Activity

          People

            hyunsik Hyunsik Choi
            hyunsik Hyunsik Choi
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: