Uploaded image for project: 'Karaf'
  1. Karaf
  2. KARAF-7861

Configuration replication missed due to race condition in cellar

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • cellar
    • None
    • Karaf using cellar in a clustered environment to replicated configuration updates.

    Description

      In a karaf cluster using cellar and more specifically cellar-config, updates of a configuration on a node is not replicated to another node.
      Investigations are pointing a race condition where one node receives the ClusterConfigurationEvent before the ReplicatedMap is effectively replicated on the impacted node. Thus, the node does not store the configuration and the local version keep staled.

      The race condition starts here :

      https://github.com/Jahia/karaf-cellar/blob/47b6984217953a5263f7e1e0da040f488cef3a3e/config/src/main/java/org/apache/karaf/cellar/config/LocalConfigurationListener.java#L119-L127

      and continues on another node here :

      https://github.com/Jahia/karaf-cellar/blob/cellar-4.1.3-jahia-fixes/config/src/main/java/org/apache/karaf/cellar/config/ConfigurationEventHandler.java

      Cellar is using a ReplicatedMap (hazelcast) to propagate configurations accross cluster and the replication operation is asynchronous. Thus, if the ClusterConfigurationEvent is received before the replication finish on the target node, nothing happens and no error is dedected nor retry.

      To reproduce the problem we can use breakpoints (thread ones) :

      • First one to simulate a long replicate operation by adding a breakpoint on the emitting node in the class  com.hazelcast.replicatedmap.impl.operation.ReplicateUpdateOperation.run()
      • Second one in cellar event listener that apply the replicated configuration : org.apache.karaf.cellar.config.ConfigurationEventHandler.handle() at line:  
        if (!equals(clusterDictionary, localDictionary) && canDistributeConfig(localDictionary)) {

      Now you update a copnfiguration on the first node. On the target node, we can see that the configuration is not updated we the event is received.

      Attachments

        Activity

          People

            jbonofre Jean-Baptiste Onofré
            jayblanc Jerome Blanchard
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated: