Uploaded image for project: 'Geode'
  1. Geode
  2. GEODE-9007

Allow rebalancing of client subscription queues

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • client queues
    • None

    Description

      In clusters where membership changes have led to one server remaining alive while others are restarted, such as in a rolling restart, it is possible for almost all clients to have one server set as the primary host for their client subscription queue, leading to that server becoming overloaded. There is currently no mechanism for client subscription queues to be moved from an overloaded server to a less loaded server, or for primary queues to be automatically reassigned based on server load, meaning that if the cluster gets into a “bad” state there there is no straightforward way to remedy the situation.

      Goal

      Users should have a way to trigger a rebalance of all client subscription queues for currently connected clients in a cluster.

      Requirements

      • No queued events should be lost from client subscription queues during the rebalance process.
      • There should be no significant impact on performance during the rebalance process, both in terms of resource use in the cluster and continuous dispatching of events to clients.
      • The rebalance process should complete in a reasonable amount of time and not repeat steps.
      • Changes in cluster membership and client subscription should not impact the success of the rebalance process.
      • Once the rebalance is complete, the total number of primary queues hosted on each server should be as close as possible to the average number of primary queues per server. Depending on client configuration, it may not be possible to perfectly balance all servers, as certain clients may not have access to certain servers, restricting which “moves” are possible.
      • Once the rebalance is complete, the total number of queues hosted on each server should be as close as possible to the average number of queues per server. The caveat regarding the total number of primary queues per server also applies in this case.

      Current Behavior

      Some aspects of existing behaviour are relevant or useful to the proposed implementation for rebalancing client subscription queues. Some behaviours that have been identified as particularly relevant are listed below:

      • Redundancy is automatically restored (or attempted to be restored) when the client detects that the number of redundant queues is less than the configured redundancy. In the case that a server shuts down or a ClientCacheProxy or CacheClientUpdater is closed, this happens immediately, but if the server is disconnected, it can potentially take up to the configured ping-interval (default value 10 seconds) for the client to begin restoring redundancy.
      • The client contacts a locator for server/load information and then decides where to best create connections based on existing queue size and primary/secondary status for, if any, and randomly otherwise, with no consideration given to server load due to other clients.
      • Extra redundant copies are not removed and it should not be possible for actual redundancy to be greater than the configured redundancy. One possible exception to this is the case of durable client queues, where it is conceptually possible for a client to lose connection to a server hosting a durable queue, create a new durable queue on a different server to restore redundancy, then recover the connection to the first server before the configured durable-client-timeout has elapsed.
      • Primary queues are not relocated during automatic redundancy restoration. If a server is hosting the primary queue for a client, that server will remain the primary until the queue is closed or the server disconnects or stops.

      Attachments

        Activity

          People

            Unassigned Unassigned
            donalevans Donal Evans
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: