Uploaded image for project: 'Atlas'
  1. Atlas
  2. ATLAS-1111

Data loss is observed when atlas is restarted while hive_table metadata ingestion into kafka topic is in-progress

Details

    • Bug
    • Status: Resolved
    • Critical
    • Resolution: Fixed
    • 0.8-incubating
    • None
    • None

    Description

      During atlas stop, the graph is shutdown first and then the services like NotificationHookConsumer is shutdown. After graph is shutdown and before NotificationHookConsumer is shutdown, the message handling fails as the graph is down, but the NotificationHookConsumer commits the message(updating the offset). So, the messages during this time are lost

      Attachments

        1. ATLAS-1111.patch
          12 kB
          Shwetha GS
        2. ATLAS-1111.1.patch
          17 kB
          Suma Shivaprasad
        3. ATLAS-1111.2.patch
          24 kB
          Shwetha GS
        4. ATLAS-1111.3.patch
          51 kB
          Shwetha GS
        5. ATLAS-1111.4.patch
          51 kB
          Shwetha GS
        6. ATLAS-1111.5.patch
          55 kB
          Suma Shivaprasad

        Activity

          shwethags Shwetha GS added a comment -

          1. Stopping services before graph shutdown
          2. Adding retries in hook consumer
          3. Logging only failed messages to another file

          shwethags Shwetha GS added a comment - 1. Stopping services before graph shutdown 2. Adding retries in hook consumer 3. Logging only failed messages to another file

          Fixed failing tests in webapp

          suma.shivaprasad Suma Shivaprasad added a comment - Fixed failing tests in webapp
          shwethags Shwetha GS added a comment -

          End to end works with this patch

          shwethags Shwetha GS added a comment - End to end works with this patch

          LGTM +1 . set atlas.kafka.consumer.timeout.ms=4000 in distro/atlas-application.properties as well?

          suma.shivaprasad Suma Shivaprasad added a comment - LGTM +1 . set atlas.kafka.consumer.timeout.ms=4000 in distro/atlas-application.properties as well?
          shwethags Shwetha GS added a comment -

          Patch with UTs fixed, and end to end tested

          kafkaconsumer.next() always returns the next message irrespective of auto commit or not. With auto commit disabled, the the offset has to be committed manually, as opposed to auto commit on next(). The last saved offset is used across consumer restarts.

          During graceful shutdown, because of graph shutdown, all the message processing failed and we did commits even in case of failures. So, after restart, atlas started at last committed offset and ignored message failures during shutdown and hence lost messages.

          The patch has the following changes:
          1. No commit in case of message failure
          2. Failed messages are written to different log file

          shwethags Shwetha GS added a comment - Patch with UTs fixed, and end to end tested kafkaconsumer.next() always returns the next message irrespective of auto commit or not. With auto commit disabled, the the offset has to be committed manually, as opposed to auto commit on next(). The last saved offset is used across consumer restarts. During graceful shutdown, because of graph shutdown, all the message processing failed and we did commits even in case of failures. So, after restart, atlas started at last committed offset and ignored message failures during shutdown and hence lost messages. The patch has the following changes: 1. No commit in case of message failure 2. Failed messages are written to different log file

          Uploading patch with minor changes

          1. Altered destruction order of GuiceServletConfig before Log4j listener and logs are now shpwing up wrt stopping services
          2. Added logger for failed messages in tests

          suma.shivaprasad Suma Shivaprasad added a comment - Uploading patch with minor changes 1. Altered destruction order of GuiceServletConfig before Log4j listener and logs are now shpwing up wrt stopping services 2. Added logger for failed messages in tests
          suma.shivaprasad Suma Shivaprasad added a comment - - edited

          Tested with above patch and notifications are not getting missed after restart. +1

          suma.shivaprasad Suma Shivaprasad added a comment - - edited Tested with above patch and notifications are not getting missed after restart. +1

          Committed. Thanks shwethags

          suma.shivaprasad Suma Shivaprasad added a comment - Committed. Thanks shwethags
          madhan Madhan Neethiraj added a comment - Committed to 0.7-incubating branch: http://git-wip-us.apache.org/repos/asf/incubator-atlas/commit/9ea1ad6d

          People

            shwethags Shwetha GS
            sharmadhas Sharmadha S
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: