Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-414

Impala server cannot detect crash-restart failures reliably

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Minor
    • Resolution: Duplicate
    • Impala 1.0.1
    • None
    • Distributed Exec

    Description

      The membership mechanism used to tell Impala servers about failures does not always detect fast crash-restarts. If a server restarts and re-registers before the state-store recognises that it has failed, the failure won't get reported to any other subscriber.

      The right way to fix this, I think, is to track a version number in every subscriber. When a subscriber reconnects, it gets a new version number. For every query, we track the highest version number of the subscriber known at that time. Then if any backend executing a query has a higher version number, it's likely to have restarted since the query started. There might be a couple of false positives, since a node could conceivably restart between a scheduling assignment and actually receiving a query, but that's unlikely and better than false negatives.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              henryr Henry Robinson
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: