Uploaded image for project: 'ZooKeeper'
  1. ZooKeeper
  2. ZOOKEEPER-2311

assert in setup_random

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 3.4.7, 3.5.1
    • 3.4.8, 3.5.2, 3.6.0
    • c client
    • None

    Description

      We've started seeing an assert failing inside setup_random at line 537:

       528 static void setup_random()
       529 {
       530 #ifndef _WIN32          // TODO: better seed
       531     int seed;
       532     int fd = open("/dev/urandom", O_RDONLY);
       533     if (fd == -1) {
       534         seed = getpid();
       535     } else {
       536         int rc = read(fd, &seed, sizeof(seed));
       537         assert(rc == sizeof(seed));
       538         close(fd);
       539     }
       540     srandom(seed);
       541     srand48(seed);
       542 #endif
      

      The core files show:

      Program terminated with signal 6, Aborted.
      #0 0x00007f9ff665a0d5 in raise () from /lib/x86_64-linux-gnu/libc.so.6
      #0 0x00007f9ff665a0d5 in raise () from /lib/x86_64-linux-gnu/libc.so.6
      #1 0x00007f9ff665d83b in abort () from /lib/x86_64-linux-gnu/libc.so.6
      #2 0x00007f9ff6652d9e in ?? () from /lib/x86_64-linux-gnu/libc.so.6
      #3 0x00007f9ff6652e42 in __assert_fail () from /lib/x86_64-linux-gnu/libc.so.6
      #4 0x00007f9ff8e4070a in setup_random () at src/zookeeper.c:476
      #5 0x00007f9ff8e40d76 in resolve_hosts (zh=0x7f9fe14de400, hosts_in=0x7f9fd700f400 "10.26.200.6:2181,10.26.200.7:2181,10.26.200.8:2181", avec=0x7f9fd87fab60) at src/zookeeper.c:730
      #6 0x00007f9ff8e40e87 in update_addrs (zh=0x7f9fe14de400) at src/zookeeper.c:801
      #7 0x00007f9ff8e44176 in zookeeper_interest (zh=0x7f9fe14de400, fd=0x7f9fd87fac4c, interest=0x7f9fd87fac50, tv=0x7f9fd87fac80) at src/zookeeper.c:1980
      #8 0x00007f9ff8e553f5 in do_io (v=0x7f9fe14de400) at src/mt_adaptor.c:379
      #9 0x00007f9ff804de9a in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
      #10 0x00007f9ff671738d in clone () from /lib/x86_64-linux-gnu/libc.so.6
      #11 0x0000000000000000 in ?? ()

      I'm not sure what the underlying cause of this is... But POSIX always allows for a short read(2), and any program MUST check for short reads...

      Has anyone else encountered this issue? We are seeing it rather frequently which is concerning.

      Attachments

        1. ZOOKEEPER-2311.patch
          1 kB
          Marshall McMullen
        2. ZOOKEEPER-2311.patch
          1 kB
          Marshall McMullen

        Issue Links

          Activity

            People

              marshall Marshall McMullen
              marshall Marshall McMullen
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: