Uploaded image for project: 'Ignite'
  1. Ignite
  2. IGNITE-12449

Calcite integration. Execution flow.

Details

    • Task
    • Status: Resolved
    • Major
    • Resolution: Done
    • None
    • None
    • None
    • None

    Description

      We need to introduce query execution environment.

      Execution should:

      • use streaming approach
      • have suspend/resume ability
      • work in event loop threads

      Rough protocol description:

      The flow is defined as a tree of operations, like filter, project etc.
      Each node provides a sink to consume data from its children and has a target sink to push data to upper node.
      Upper node may signal that it's ready to consume data. After a node received a signal it starts to push data into a target sink until the sink says there is no place for new data, after that the node stops pushing data until a new signal.
      Some of nodes (like inbox node, describing remote input) may signal that there is some new data, it forces a root node to signal its children top to bottom.
      When a signal arrived an inbox node, the inbox starts to push the new data.
      When a node realizes the data is over, it sends "end" signal to a target sink, after that an upper node wont signal the node to continue data pushing.

      Attachments

        Issue Links

          Activity

            ustas, please look at

            gvvinblade Igor Seliverstov added a comment - ustas , please look at
            ustas Ilya Suntsov added a comment -

            gvvinblade 

            1. What problem are you trying to solve with the new approach?
            2. What is the limitations of this approach? What is out of scope?

            mshonichev please take a look at description and share your thoughts.

             

            ustas Ilya Suntsov added a comment - gvvinblade   What problem are you trying to solve with the new approach? What is the limitations of this approach? What is out of scope? mshonichev  please take a look at description and share your thoughts.  

            ustas, this is a brand new query execution flow (designed for Calcite based query engine) that uses a reactive approach and "push" semantics.

            This is needed to execute a query, with possible cross dependencies, by limited number of threads without blocking operations when all dependency conflicts are solving automatically by operations reordering (if an operation needs some data that unavailable at the moment, it stops executing until a data source signals there is some new data available).

            At the other hand "push" approach reduces request-reply roundtrips, that reduces query execution latency.

            Back pressure is a part of the protocol (if a consumer is not ready to process a data producer stops data sending until a consumer signal) it allows us to use small operation buffers that reduces possible OOM during execution - a reducer doesn't collect whole dataset except several cases, like data sorting, that effectively may be solved by an index introducing.

            gvvinblade Igor Seliverstov added a comment - ustas , this is a brand new query execution flow (designed for Calcite based query engine) that uses a reactive approach and "push" semantics. This is needed to execute a query, with possible cross dependencies, by limited number of threads without blocking operations when all dependency conflicts are solving automatically by operations reordering (if an operation needs some data that unavailable at the moment, it stops executing until a data source signals there is some new data available). At the other hand "push" approach reduces request-reply roundtrips, that reduces query execution latency. Back pressure is a part of the protocol (if a consumer is not ready to process a data producer stops data sending until a consumer signal) it allows us to use small operation buffers that reduces possible OOM during execution - a reducer doesn't collect whole dataset except several cases, like data sorting, that effectively may be solved by an index introducing.

            Current limitations: only limited operation types are implemented: nested loop join, filter, project, exchange, table scan (sort, index scan are under development)

            what is out of scope: query parsing, planning, validation, etc.

            So, let's say you have several query fragments (executing on different nodes), each of them is an operations tree, each of them executes in any order.

            that we need to prove:

            • dependency conflicts are solving as expected, so, regardless of initial start order all, the query parts eventually executes in proper order.
            • There is no deadlocks
            • There is no OOM, operation buffers are used as expected, except known cases when an operation requires full dataset (nested loop joins, sort operations, grouping operations, etc)
            gvvinblade Igor Seliverstov added a comment - Current limitations: only limited operation types are implemented: nested loop join, filter, project, exchange, table scan (sort, index scan are under development) what is out of scope: query parsing, planning, validation, etc. So, let's say you have several query fragments (executing on different nodes), each of them is an operations tree, each of them executes in any order. that we need to prove: dependency conflicts are solving as expected, so, regardless of initial start order all, the query parts eventually executes in proper order. There is no deadlocks There is no OOM, operation buffers are used as expected, except known cases when an operation requires full dataset (nested loop joins, sort operations, grouping operations, etc)

            Done in scope of IGNITE-12448

            gvvinblade Igor Seliverstov added a comment - Done in scope of IGNITE-12448

            People

              gvvinblade Igor Seliverstov
              gvvinblade Igor Seliverstov
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: