Uploaded image for project: 'Mesos'
  1. Mesos
  2. MESOS-3975

SSL build of mesos causes flaky testsuite.

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 0.26.0
    • 0.26.0
    • None
    • CentOS 7.1, Kernel 3.10.0-229.20.1.el7.x86_64, gcc 4.8.3, Docker 1.9

    • Mesosphere Sprint 23
    • 5

    Description

      When running the tests of an SSL build of Mesos on CentOS 7.1, I see spurious test failures that are, so far, not reproducible.

      The following tests did fail for me in complete runs but did seem fine when running them individually, in repetition.

      DockerTest.ROOT_DOCKER_CheckPortResource
      
      ContainerizerTest.ROOT_CGROUPS_BalloonFramework
      
      [ RUN      ] LinuxFilesystemIsolatorTest.ROOT_ChangeRootFilesystemCommandExecutor
      2015-11-20 19:08:38,826:21380(0x7fa10d5f2700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:53444] zk retcode=-4, errno=111(Connection refused): server refused to accept the client
      + /home/vagrant/mesos/build/src/mesos-containerizer mount --help=false --operation=make-rslave --path=/
      + grep -E /tmp/LinuxFilesystemIsolatorTest_ROOT_ChangeRootFilesystemCommandExecutor_Tz7P8c/.+ /proc/self/mountinfo
      + grep -v 2b98025c-74f1-41d2-b35a-ce2cdfae347e
      + cut '-d ' -f5
      + xargs --no-run-if-empty umount -l
      + mount -n --rbind /tmp/LinuxFilesystemIsolatorTest_ROOT_ChangeRootFilesystemCommandExecutor_Tz7P8c/provisioner/containers/2b98025c-74f1-41d2-b35a-ce2cdfae347e/backends/copy/rootfses/bed11080-474b-4c69-8e7f-0ab85e895b0d /tmp/LinuxFilesystemIsolatorTest_ROOT_ChangeRootFilesystemCommandExecutor_Tz7P8c/slaves/830e842e-c36a-4e4c-bff4-5b9568d7df12-S0/frameworks/830e842e-c36a-4e4c-bff4-5b9568d7df12-0000/executors/c735be54-c47f-4645-bfc1-2f4647e2cddb/runs/2b98025c-74f1-41d2-b35a-ce2cdfae347e/.rootfs
      Could not load cert file
      ../../src/tests/containerizer/filesystem_isolator_tests.cpp:354: Failure
      Value of: statusRunning.get().state()
        Actual: TASK_FAILED
      Expected: TASK_RUNNING
      2015-11-20 19:08:42,164:21380(0x7fa10d5f2700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:53444] zk retcode=-4, errno=111(Connection refused): server refused to accept the client
      2015-11-20 19:08:45,501:21380(0x7fa10d5f2700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:53444] zk retcode=-4, errno=111(Connection refused): server refused to accept the client
      2015-11-20 19:08:48,837:21380(0x7fa10d5f2700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:53444] zk retcode=-4, errno=111(Connection refused): server refused to accept the client
      2015-11-20 19:08:52,174:21380(0x7fa10d5f2700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:53444] zk retcode=-4, errno=111(Connection refused): server refused to accept the client
      ../../src/tests/containerizer/filesystem_isolator_tests.cpp:355: Failure
      Failed to wait 15secs for statusFinished
      ../../src/tests/containerizer/filesystem_isolator_tests.cpp:349: Failure
      Actual function call count doesn't match EXPECT_CALL(sched, statusUpdate(&driver, _))...
               Expected: to be called twice
                 Actual: called once - unsatisfied and active
      2015-11-20 19:08:55,511:21380(0x7fa10d5f2700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:53444] zk retcode=-4, errno=111(Connection refused): server refused to accept the client
      *** Aborted at 1448046536 (unix time) try "date -d @1448046536" if you are using GNU date ***
      PC: @                0x0 (unknown)
      *** SIGSEGV (@0x0) received by PID 21380 (TID 0x7fa1549e68c0) from PID 0; stack trace: ***
          @     0x7fa141796fbb (unknown)
          @     0x7fa14179b341 (unknown)
          @     0x7fa14f096130 (unknown)
      

      Vagrantfile generator:

      cat << EOF > Vagrantfile
      # -*- mode: ruby -*-" >
      # vi: set ft=ruby :
      Vagrant.configure(2) do |config|
        # Disable shared folder to prevent certain kernel module dependencies.
        config.vm.synced_folder ".", "/vagrant", disabled: true
      
        config.vm.hostname = "centos71"
      
        config.vm.box = "bento/centos-7.1"
      
        config.vm.provider "virtualbox" do |vb|
          vb.memory = 16384
          vb.cpus = 8
        end
      
        config.vm.provider "vmware_fusion" do |vb|
          vb.memory = 9216
          vb.cpus = 4
        end
      
        config.vm.provision "shell", inline: <<-SHELL
      
           sudo yum -y update systemd
      
           sudo yum install -y tar wget
           sudo wget http://repos.fedorapeople.org/repos/dchen/apache-maven/epel-apache-maven.repo -O /etc/yum.repos.d/epel-apache-maven.repo
      
           sudo yum groupinstall -y "Development Tools"
           sudo yum install -y apache-maven python-devel java-1.7.0-openjdk-devel zlib-devel libcurl-devel openssl-devel cyrus-sasl-devel cyrus-sasl-md5 apr-devel subversion-devel apr-util-devel
      
           sudo yum install libevent-devel
      
           sudo yum install -y git
      
           sudo yum install -y docker
           sudo service docker start
           sudo docker info
      
           #sudo wget -qO- https://get.docker.com/ | sh
      
        SHELL
      end
      EOF
      
      vagrant up
      vagrant reload
      
      vagrant ssh -c "
      git clone  https://github.com/apache/mesos.git mesos
      cd mesos
      git checkout -b 0.26.0-rc1 0.26.0-rc1
      
      ./bootstrap
      mkdir build
      cd build
      
      ../configure --enable-libevent --enable-ssl
      GTEST_FILTER="" make check
      sudo ./bin/mesos-tests.sh
      "
      

      Attachments

        Issue Links

          Activity

            People

              kaysoky Joseph Wu
              tillt Till Toenshoff
              Till Toenshoff Till Toenshoff
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: