Uploaded image for project: 'Samza'
  1. Samza
  2. SAMZA-1044

Checkpointing requires log.cleaner.enable=true

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Minor
    • Resolution: Unresolved
    • None
    • None
    • docs
    • None
    • linux

    Description

      We're running Samza 0.9.1 with kafka 0.8.2.1, which has a default setting of log.cleaner.enable=false. We didn't think we needed to enable this, as we never created any topics with cleanup.policy=compact. However, this morning we had a disk alert, and when I took a look on the broker that triggered the alert, one of the Samza checkpoint topics was consuming 29GB within the /logs folder.

      Long story short, I eventually figured out that all of the checkpoint topics were created with cleanup.policy=compact, and were growing unbounded. I set log.cleaner.enable=true on each broker, and restarted them. Within minutes, the 29GB was reduced to a 200-300KB.

      I thought I must have missed this when I created our jobs with checkpointing enabled, so I went and scoured the docs. There's no mention of the log.cleaner.enable setting within the documentation (unless I missed it again).

      I should add that we've been running most of these jobs for about a year, and I noticed that each time we would deploy, it would take longer and longer to transition from ACCEPTED to RUNNING in the YARN cluster. Eventually, it was taking 10-15 minutes per job, and we didn't understand why. After bouncing our staging cluster with log.cleaner.enable=true (and letting the log cleaner finish its work), I redeployed one of our jobs, and it once again took 15-20 seconds from ACCEPTED to RUNNING.

      Please mention in the documentation that log.cleaner.enable must be set to true for checkpointing to work correctly.

      Attachments

        Activity

          People

            Unassigned Unassigned
            mmindenhall Mark Mindenhall
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: