Uploaded image for project: 'Mesos'
  1. Mesos
  2. MESOS-7729

ExamplesTest.DynamicReservationFramework is flaky

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 1.5.0
    • None

    Description

      Observed this on ASF CI

      [ RUN      ] ExamplesTest.DynamicReservationFramework
      Using temporary directory '/tmp/ExamplesTest_DynamicReservationFramework_uPVIaN'
      /mesos/mesos-1.4.0/src/tests/dynamic_reservation_framework_test.sh: line 19: /mesos/mesos-1.4.0/_build/src/colors.sh: No such file or directory
      /mesos/mesos-1.4.0/src/tests/dynamic_reservation_framework_test.sh: line 20: /mesos/mesos-1.4.0/_build/src/atexit.sh: No such file or directory
      WARNING: Logging before InitGoogleLogging() is written to STDERR
      I0627 16:04:20.661948  8847 process.cpp:1282] libprocess is initialized on 172.17.0.3:37113 with 16 worker threads
      I0627 16:04:20.662199  8847 logging.cpp:199] Logging to STDERR
      I0627 16:04:20.674317  8847 leveldb.cpp:174] Opened db in 3.343216ms
      I0627 16:04:20.675655  8847 leveldb.cpp:181] Compacted db in 1.264481ms
      I0627 16:04:20.675797  8847 leveldb.cpp:196] Created db iterator in 89655ns
      I0627 16:04:20.675829  8847 leveldb.cpp:202] Seeked to beginning of db in 5551ns
      I0627 16:04:20.675848  8847 leveldb.cpp:271] Iterated through 0 keys in the db in 1133ns
      I0627 16:04:20.676103  8847 replica.cpp:779] Replica recovered with log positions 0 -> 0 with 1 holes and 0 unlearned
      I0627 16:04:20.680465  8873 recover.cpp:451] Starting replica recovery
      I0627 16:04:20.681649  8873 recover.cpp:477] Replica is in EMPTY status
      I0627 16:04:20.682160  8847 local.cpp:272] Creating default 'local' authorizer
      I0627 16:04:20.684504  8884 replica.cpp:676] Replica in EMPTY status received a broadcasted recover request from __req_res__(1)@172.17.0.3:37113
      I0627 16:04:20.685750  8882 recover.cpp:197] Received a recover response from a replica in EMPTY status
      I0627 16:04:20.686617  8877 recover.cpp:568] Updating replica status to STARTING
      I0627 16:04:20.688508  8877 leveldb.cpp:304] Persisting metadata (8 bytes) to leveldb took 741914ns
      I0627 16:04:20.688544  8881 master.cpp:438] Master c1c3a180-5bd3-42fa-b84e-e2c30aba7364 (089cb2cc2625) started on 172.17.0.3:37113
      I0627 16:04:20.688551  8877 replica.cpp:322] Persisted replica status to STARTING
      I0627 16:04:20.689095  8878 recover.cpp:477] Replica is in STARTING status
      I0627 16:04:20.688582  8881 master.cpp:440] Flags at startup: --acls="permissive: true
      register_frameworks {
        principals {
          type: ANY
        }
        roles {
          type: SOME
          values: "test"
        }
      }
      " --agent_ping_timeout="15secs" --agent_reregister_timeout="10mins" --allocation_interval="1secs" --allocator="HierarchicalDRF" --authenticate_agents="false" --authenticate_frameworks="false" --authenticate_http_frameworks="false" --authenticate_http_readonly="false" --authenticate_http_readwrite="false" --authenticators="crammd5" --authorizers="local" --credentials="/tmp/ExamplesTest_DynamicReservationFramework_uPVIaN/credentials" --filter_gpu_resources="true" --framework_sorter="drf" --help="false" --hostname_lookup="true" --http_authenticators="basic" --initialize_driver_logging="true" --log_auto_initialize="true" --logbufsecs="0" --logging_level="INFO" --max_agent_ping_timeouts="5" --max_completed_frameworks="50" --max_completed_tasks_per_framework="1000" --max_unreachable_tasks_per_framework="1000" --port="5050" --quiet="false" --recovery_agent_removal_limit="100%" --registry="replicated_log" --registry_fetch_timeout="1mins" --registry_gc_interval="15mins" --registry_max_agent_age="2weeks" --registry_max_agent_count="102400" --registry_store_timeout="20secs" --registry_strict="false" --root_submissions="true" --user_sorter="drf" --version="false" --webui_dir="/mesos/mesos-1.4.0/src/webui" --work_dir="/tmp/mesos-9H3Est/master" --zk_session_timeout="10secs"
      I0627 16:04:20.689460  8881 master.cpp:492] Master allowing unauthenticated frameworks to register
      I0627 16:04:20.689476  8881 master.cpp:506] Master allowing unauthenticated agents to register
      I0627 16:04:20.689482  8881 master.cpp:520] Master allowing HTTP frameworks to register without authentication
      I0627 16:04:20.689494  8881 credentials.hpp:37] Loading credentials for authentication from '/tmp/ExamplesTest_DynamicReservationFramework_uPVIaN/credentials'
      W0627 16:04:20.689620  8881 credentials.hpp:52] Permissions on credentials file '/tmp/ExamplesTest_DynamicReservationFramework_uPVIaN/credentials' are too open; it is recommended that your credentials file is NOT accessible by others
      I0627 16:04:20.689817  8881 master.cpp:562] Using default 'crammd5' authenticator
      I0627 16:04:20.690021  8881 authenticator.cpp:520] Initializing server SASL
      I0627 16:04:20.690615  8878 replica.cpp:676] Replica in STARTING status received a broadcasted recover request from __req_res__(2)@172.17.0.3:37113
      I0627 16:04:20.691058  8877 recover.cpp:197] Received a recover response from a replica in STARTING status
      I0627 16:04:20.691951  8885 recover.cpp:568] Updating replica status to VOTING
      I0627 16:04:20.692548  8881 auxprop.cpp:73] Initialized in-memory auxiliary property plugin
      I0627 16:04:20.692679  8881 master.cpp:642] Authorization enabled
      I0627 16:04:20.692991  8872 leveldb.cpp:304] Persisting metadata (8 bytes) to leveldb took 525491ns
      I0627 16:04:20.693245  8872 replica.cpp:322] Persisted replica status to VOTING
      I0627 16:04:20.693567  8882 recover.cpp:582] Successfully joined the Paxos group
      I0627 16:04:20.693084  8876 hierarchical.cpp:169] Initialized hierarchical allocator process
      I0627 16:04:20.693971  8882 recover.cpp:466] Recover process terminated
      I0627 16:04:20.693136  8885 whitelist_watcher.cpp:77] No whitelist given
      I0627 16:04:20.695565  8847 resolver.cpp:69] Creating default secret resolver
      I0627 16:04:20.696056  8847 containerizer.cpp:230] Using isolation: filesystem/posix,posix/cpu,posix/mem,network/cni,environment_secret
      W0627 16:04:20.697029  8847 backend.cpp:76] Failed to create 'aufs' backend: AufsBackend requires root privileges
      W0627 16:04:20.697257  8847 backend.cpp:76] Failed to create 'bind' backend: BindBackend requires root privileges
      I0627 16:04:20.697340  8847 provisioner.cpp:255] Using default backend 'copy'
      I0627 16:04:20.702487  8878 slave.cpp:249] Mesos agent started on (1)@172.17.0.3:37113
      I0627 16:04:20.702535  8878 slave.cpp:250] Flags at startup: --acls="permissive: true
      register_frameworks {
        principals {
          type: ANY
        }
        roles {
          type: SOME
          values: "test"
        }
      }
      " --appc_simple_discovery_uri_prefix="http://" --appc_store_dir="/tmp/mesos/store/appc" --authenticate_http_readonly="false" --authenticate_http_readwrite="false" --authenticatee="crammd5" --authentication_backoff_factor="1secs" --authorizer="local" --cgroups_cpu_enable_pids_and_tids_count="false" --cgroups_enable_cfs="false" --cgroups_hierarchy="/sys/fs/cgroup" --cgroups_limit_swap="false" --cgroups_root="mesos" --container_disk_watch_interval="15secs" --containerizers="mesos" --default_role="*" --disk_watch_interval="1mins" --docker="docker" --docker_kill_orphans="true" --docker_registry="https://registry-1.docker.io" --docker_remove_delay="6hrs" --docker_socket="/var/run/docker.sock" --docker_stop_timeout="0ns" --docker_store_dir="/tmp/mesos/store/docker" --docker_volume_checkpoint_dir="/var/run/mesos/isolators/docker/volume" --enforce_container_disk_quota="false" --executor_registration_timeout="1mins" --executor_reregistration_timeout="2secs" --executor_shutdown_grace_period="5secs" --fetcher_cache_dir="/tmp/mesos-9H3Est/agents/0/fetch" --fetcher_cache_size="2GB" --frameworks_home="" --gc_delay="1weeks" --gc_disk_headroom="0.1" --hadoop_home="" --help="false" --hostname_lookup="true" --http_command_executor="false" --http_heartbeat_interval="30secs" --initialize_driver_logging="true" --isolation="filesystem/posix,posix/cpu,posix/mem" --launcher="posix" --launcher_dir="/mesos/mesos-1.4.0/_build/src" --logbufsecs="0" --logging_level="INFO" --max_completed_executors_per_framework="150" --oversubscribed_resources_interval="15secs" --perf_duration="10secs" --perf_interval="1mins" --port="5051" --qos_correction_interval_min="0ns" --quiet="false" --recover="reconnect" --recovery_timeout="15mins" --registration_backoff_factor="1secs" --revocable_cpu_low_priority="true" --runtime_dir="/tmp/mesos-9H3Est/agents/0/run" --sandbox_directory="/mnt/mesos/sandbox" --strict="true" --switch_user="true" --systemd_enable_support="true" --systemd_runtime_directory="/run/systemd/system" --version="false" --work_dir="/tmp/mesos-9H3Est/agents/0/work"
      I0627 16:04:20.706677  8847 resolver.cpp:69] Creating default secret resolver
      I0627 16:04:20.707231  8847 containerizer.cpp:230] Using isolation: filesystem/posix,posix/cpu,posix/mem,network/cni,environment_secret
      W0627 16:04:20.708060  8847 backend.cpp:76] Failed to create 'aufs' backend: AufsBackend requires root privileges
      W0627 16:04:20.708179  8847 backend.cpp:76] Failed to create 'bind' backend: BindBackend requires root privileges
      I0627 16:04:20.708231  8847 provisioner.cpp:255] Using default backend 'copy'
      I0627 16:04:20.710345  8881 master.cpp:2161] Elected as the leading master!
      I0627 16:04:20.710400  8881 master.cpp:1700] Recovering from registrar
      I0627 16:04:20.710644  8886 registrar.cpp:345] Recovering registrar
      I0627 16:04:20.711997  8881 log.cpp:536] Attempting to start the writer
      I0627 16:04:20.714257  8880 slave.cpp:249] Mesos agent started on (2)@172.17.0.3:37113
      I0627 16:04:20.714516  8880 slave.cpp:250] Flags at startup: --acls="permissive: true
      register_frameworks {
        principals {
          type: ANY
        }
        roles {
          type: SOME
          values: "test"
        }
      }
      " --appc_simple_discovery_uri_prefix="http://" --appc_store_dir="/tmp/mesos/store/appc" --authenticate_http_readonly="false" --authenticate_http_readwrite="false" --authenticatee="crammd5" --authentication_backoff_factor="1secs" --authorizer="local" --cgroups_cpu_enable_pids_and_tids_count="false" --cgroups_enable_cfs="false" --cgroups_hierarchy="/sys/fs/cgroup" --cgroups_limit_swap="false" --cgroups_root="mesos" --container_disk_watch_interval="15secs" --containerizers="mesos" --default_role="*" --disk_watch_interval="1mins" --docker="docker" --docker_kill_orphans="true" --docker_registry="https://registry-1.docker.io" --docker_remove_delay="6hrs" --docker_socket="/var/run/docker.sock" --docker_stop_timeout="0ns" --docker_store_dir="/tmp/mesos/store/docker" --docker_volume_checkpoint_dir="/var/run/mesos/isolators/docker/volume" --enforce_container_disk_quota="false" --executor_registration_timeout="1mins" --executor_reregistration_timeout="2secs" --executor_shutdown_grace_period="5secs" --fetcher_cache_dir="/tmp/mesos-9H3Est/agents/1/fetch" --fetcher_cache_size="2GB" --frameworks_home="" --gc_delay="1weeks" --gc_disk_headroom="0.1" --hadoop_home="" --help="false" --hostname_lookup="true" --http_command_executor="false" --http_heartbeat_interval="30secs" --initialize_driver_logging="true" --isolation="filesystem/posix,posix/cpu,posix/mem" --launcher="posix" --launcher_dir="/mesos/mesos-1.4.0/_build/src" --logbufsecs="0" --logging_level="INFO" --max_completed_executors_per_framework="150" --oversubscribed_resources_interval="15secs" --perf_duration="10secs" --perf_interval="1mins" --port="5051" --qos_correction_interval_min="0ns" --quiet="false" --recover="reconnect" --recovery_timeout="15mins" --registration_backoff_factor="1secs" --revocable_cpu_low_priority="true" --runtime_dir="/tmp/mesos-9H3Est/agents/1/run" --sandbox_directory="/mnt/mesos/sandbox" --strict="true" --switch_user="true" --systemd_enable_support="true" --systemd_runtime_directory="/run/systemd/system" --version="false" --work_dir="/tmp/mesos-9H3Est/agents/1/work"
      I0627 16:04:20.715459  8884 replica.cpp:496] Replica received implicit promise request from __req_res__(3)@172.17.0.3:37113 with proposal 1
      I0627 16:04:20.716228  8884 leveldb.cpp:304] Persisting metadata (8 bytes) to leveldb took 693410ns
      I0627 16:04:20.716259  8884 replica.cpp:344] Persisted promised to 1
      I0627 16:04:20.717954  8875 coordinator.cpp:238] Coordinator attempting to fill missing positions
      I0627 16:04:20.718142  8847 resolver.cpp:69] Creating default secret resolver
      I0627 16:04:20.718848  8847 containerizer.cpp:230] Using isolation: filesystem/posix,posix/cpu,posix/mem,network/cni,environment_secret
      W0627 16:04:20.719561  8847 backend.cpp:76] Failed to create 'aufs' backend: AufsBackend requires root privileges
      W0627 16:04:20.719844  8847 backend.cpp:76] Failed to create 'bind' backend: BindBackend requires root privileges
      I0627 16:04:20.719885  8847 provisioner.cpp:255] Using default backend 'copy'
      I0627 16:04:20.721727  8872 replica.cpp:391] Replica received explicit promise request from __req_res__(4)@172.17.0.3:37113 for position 0 with proposal 2
      I0627 16:04:20.722569  8872 leveldb.cpp:341] Persisting action (8 bytes) to leveldb took 685913ns
      I0627 16:04:20.722607  8872 replica.cpp:711] Persisted action NOP at position 0
      I0627 16:04:20.724098  8872 slave.cpp:249] Mesos agent started on (3)@172.17.0.3:37113
      I0627 16:04:20.724123  8872 slave.cpp:250] Flags at startup: --acls="permissive: true
      register_frameworks {
        principals {
          type: ANY
        }
        roles {
          type: SOME
          values: "test"
        }
      }
      " --appc_simple_discovery_uri_prefix="http://" --appc_store_dir="/tmp/mesos/store/appc" --authenticate_http_readonly="false" --authenticate_http_readwrite="false" --authenticatee="crammd5" --authentication_backoff_factor="1secs" --authorizer="local" --cgroups_cpu_enable_pids_and_tids_count="false" --cgroups_enable_cfs="false" --cgroups_hierarchy="/sys/fs/cgroup" --cgroups_limit_swap="false" --cgroups_root="mesos" --container_disk_watch_interval="15secs" --containerizers="mesos" --default_role="*" --disk_watch_interval="1mins" --docker="docker" --docker_kill_orphans="true" --docker_registry="https://registry-1.docker.io" --docker_remove_delay="6hrs" --docker_socket="/var/run/docker.sock" --docker_stop_timeout="0ns" --docker_store_dir="/tmp/mesos/store/docker" --docker_volume_checkpoint_dir="/var/run/mesos/isolators/docker/volume" --enforce_container_disk_quota="false" --executor_registration_timeout="1mins" --executor_reregistration_timeout="2secs" --executor_shutdown_grace_period="5secs" --fetcher_cache_dir="/tmp/mesos-9H3Est/agents/2/fetch" --fetcher_cache_size="2GB" --frameworks_home="" --gc_delay="1weeks" --gc_disk_headroom="0.1" --hadoop_home="" --help="false" --hostname_lookup="true" --http_command_executor="false" --http_heartbeat_interval="30secs" --initialize_driver_logging="true" --isolation="filesystem/posix,posix/cpu,posix/mem" --launcher="posix" --launcher_dir="/mesos/mesos-1.4.0/_build/src" --logbufsecs="0" --logging_level="INFO" --max_completed_executors_per_framework="150" --oversubscribed_resources_interval="15secs" --perf_duration="10secs" --perf_interval="1mins" --port="5051" --qos_correction_interval_min="0ns" --quiet="false" --recover="reconnect" --recovery_timeout="15mins" --registration_backoff_factor="1secs" --revocable_cpu_low_priority="true" --runtime_dir="/tmp/mesos-9H3Est/agents/2/run" --sandbox_directory="/mnt/mesos/sandbox" --strict="true" --switch_user="true" --systemd_enable_support="true" --systemd_runtime_directory="/run/systemd/system" --version="false" --work_dir="/tmp/mesos-9H3Est/agents/2/work"
      I0627 16:04:20.725759  8847 sched.cpp:232] Version: 1.4.0
      I0627 16:04:20.727453  8877 replica.cpp:540] Replica received write request for position 0 from __req_res__(5)@172.17.0.3:37113
      I0627 16:04:20.727560  8877 leveldb.cpp:436] Reading position from leveldb took 65766ns
      I0627 16:04:20.727771  8884 sched.cpp:336] New master detected at master@172.17.0.3:37113
      I0627 16:04:20.728091  8877 leveldb.cpp:341] Persisting action (14 bytes) to leveldb took 432923ns
      I0627 16:04:20.728114  8877 replica.cpp:711] Persisted action NOP at position 0
      I0627 16:04:20.728425  8884 sched.cpp:352] No credentials provided. Attempting to register without authentication
      I0627 16:04:20.728458  8884 sched.cpp:836] Sending SUBSCRIBE call to master@172.17.0.3:37113
      I0627 16:04:20.729207  8876 replica.cpp:694] Replica received learned notice for position 0 from @0.0.0.0:0
      I0627 16:04:20.729523  8884 sched.cpp:869] Will retry registration in 1.548772469secs if necessary
      I0627 16:04:20.729578  8887 master.cpp:1523] Dropping 'mesos.scheduler.Call' message since not recovered yet
      I0627 16:04:20.729859  8876 leveldb.cpp:341] Persisting action (16 bytes) to leveldb took 594682ns
      I0627 16:04:20.729890  8876 replica.cpp:711] Persisted action NOP at position 0
      I0627 16:04:20.730602  8882 log.cpp:552] Writer started with ending position 0
      I0627 16:04:20.704807  8878 slave.cpp:553] Agent resources: [{"name":"cpus","scalar":{"value":16.0},"type":"SCALAR"},{"name":"mem","scalar":{"value":47270.0},"type":"SCALAR"},{"name":"disk","scalar":{"value":3701220.0},"type":"SCALAR"},{"name":"ports","ranges":{"range":[{"begin":31000,"end":32000}]},"type":"RANGES"}]
      I0627 16:04:20.716545  8880 slave.cpp:553] Agent resources: [{"name":"cpus","scalar":{"value":16.0},"type":"SCALAR"},{"name":"mem","scalar":{"value":47270.0},"type":"SCALAR"},{"name":"disk","scalar":{"value":3701220.0},"type":"SCALAR"},{"name":"ports","ranges":{"range":[{"begin":31000,"end":32000}]},"type":"RANGES"}]
      I0627 16:04:20.732960  8880 slave.cpp:561] Agent attributes: [  ]
      I0627 16:04:20.733045  8878 slave.cpp:561] Agent attributes: [  ]
      I0627 16:04:20.730206  8872 slave.cpp:553] Agent resources: [{"name":"cpus","scalar":{"value":16.0},"type":"SCALAR"},{"name":"mem","scalar":{"value":47270.0},"type":"SCALAR"},{"name":"disk","scalar":{"value":3701220.0},"type":"SCALAR"},{"name":"ports","ranges":{"range":[{"begin":31000,"end":32000}]},"type":"RANGES"}]
      I0627 16:04:20.733058  8880 slave.cpp:566] Agent hostname: 089cb2cc2625
      I0627 16:04:20.733126  8872 slave.cpp:561] Agent attributes: [  ]
      I0627 16:04:20.733155  8872 slave.cpp:566] Agent hostname: 089cb2cc2625
      I0627 16:04:20.733086  8878 slave.cpp:566] Agent hostname: 089cb2cc2625
      I0627 16:04:20.733321  8879 leveldb.cpp:436] Reading position from leveldb took 79459ns
      I0627 16:04:20.733381  8887 status_update_manager.cpp:177] Pausing sending status updates
      I0627 16:04:20.733424  8887 status_update_manager.cpp:177] Pausing sending status updates
      I0627 16:04:20.733777  8887 status_update_manager.cpp:177] Pausing sending status updates
      I0627 16:04:20.737633  8885 state.cpp:64] Recovering state from '/tmp/mesos-9H3Est/agents/2/work/meta'
      I0627 16:04:20.738603  8874 status_update_manager.cpp:203] Recovering status update manager
      *** Error in `/mesos/mesos-1.4.0/_build/src/.libs/lt-dynamic-reservation-framework': double free or corruption (fasttop): 0x00002b7eac002ae0 ***
      ../../src/tests/script.cpp:83: Failure
      Failed
      dynamic_reservation_framework_test.sh terminated with signal Aborted
      [  FAILED  ] ExamplesTest.DynamicReservationFramework (2904 ms)
      
      

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              vinodkone Vinod Kone
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: