Uploaded image for project: 'Bookkeeper'
  1. Bookkeeper
  2. BOOKKEEPER-371

NPE in hedwig hub client causes hedwig hub to shut down.

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 4.2.0
    • hedwig-client
    • None

    Description

      The hedwig client was connected to a remote region hub that restarted resulting in the channel getting disconnected.

      2012-08-15 17:47:42,443 - ERROR - [pool-20-thread-1:TerminateJVMExceptionHandler@28] - Uncaught exception in thread pool-20-thread-1
      java.lang.NullPointerException
      at org.apache.hedwig.client.netty.HedwigClientImpl.getResponseHandlerFromChannel(HedwigClientImpl.java:323)
      at org.apache.hedwig.client.handlers.MessageConsumeCallback.operationFinished(MessageConsumeCallback.java:75)
      at org.apache.hedwig.client.handlers.MessageConsumeCallback.operationFinished(MessageConsumeCallback.java:41)
      at org.apache.hedwig.server.regions.RegionManager$1$1$1.operationFinished(RegionManager.java:208)
      at org.apache.hedwig.server.regions.RegionManager$1$1$1.operationFinished(RegionManager.java:202)
      at org.apache.hedwig.server.persistence.ReadAheadCache$PersistCallback.operationFinished(ReadAheadCache.java:194)
      at org.apache.hedwig.server.persistence.ReadAheadCache$PersistCallback.operationFinished(ReadAheadCache.java:171)
      at org.apache.hedwig.server.persistence.BookkeeperPersistenceManager$PersistOp$1.safeAddComplete(BookkeeperPersistenceManager.java:548)
      at org.apache.hedwig.zookeeper.SafeAsynBKCallback$AddCallback.addComplete(SafeAsynBKCallback.java:93)
      at org.apache.bookkeeper.client.PendingAddOp.submitCallback(PendingAddOp.java:165)
      at org.apache.bookkeeper.client.LedgerHandle.sendAddSuccessCallbacks(LedgerHandle.java:643)
      at org.apache.bookkeeper.client.PendingAddOp.writeComplete(PendingAddOp.java:159)
      at org.apache.bookkeeper.proto.PerChannelBookieClient.handleAddResponse(PerChannelBookieClient.java:577)
      at org.apache.bookkeeper.proto.PerChannelBookieClient$7.safeRun(PerChannelBookieClient.java:525)
      at org.apache.bookkeeper.util.SafeRunnable.run(SafeRunnable.java:31)
      at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
      at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
      at java.util.concurrent.FutureTask.run(FutureTask.java:166)
      at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
      at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
      at java.lang.Thread.run(Thread.java:722)

      At 2012-08-15 17:47:42,443, the channel was disconnected as well.

      I believe the following code in the MessageConsumeCallback is causing this problem.

      Channel topicSubscriberChannel = client.getSubscriber().getChannelForTopic(topicSubscriber);
      HedwigClientImpl.getResponseHandlerFromChannel(topicSubscriberChannel).getSubscribeResponseHandler()
      .messageConsumed(messageConsumeData.msg);

      The channel was retrieved without checking if it was closed and then getPipeline().getLast() was called which returned a null value resulting in a NPE. Moreover, we need to check if the returned Response handler is not null because there is a race here if channel.close() is called after we retrieve the channel and before we call messageConsumed().

      I guess the same applies for other instances where we use this.
      Does the above explanation seem right?

      Attachments

        1. BK-371.patch
          21 kB
          Aniruddha
        2. BK-371.patch
          21 kB
          Aniruddha
        3. BOOKKEEPER-371.diff
          20 kB
          Sijie Guo

        Issue Links

          Activity

            People

              i0exception Aniruddha
              i0exception Aniruddha
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: