Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
-
None
Description
Observed this on ASF CI
[ RUN ] ExamplesTest.DynamicReservationFramework Using temporary directory '/tmp/ExamplesTest_DynamicReservationFramework_uPVIaN' /mesos/mesos-1.4.0/src/tests/dynamic_reservation_framework_test.sh: line 19: /mesos/mesos-1.4.0/_build/src/colors.sh: No such file or directory /mesos/mesos-1.4.0/src/tests/dynamic_reservation_framework_test.sh: line 20: /mesos/mesos-1.4.0/_build/src/atexit.sh: No such file or directory WARNING: Logging before InitGoogleLogging() is written to STDERR I0627 16:04:20.661948 8847 process.cpp:1282] libprocess is initialized on 172.17.0.3:37113 with 16 worker threads I0627 16:04:20.662199 8847 logging.cpp:199] Logging to STDERR I0627 16:04:20.674317 8847 leveldb.cpp:174] Opened db in 3.343216ms I0627 16:04:20.675655 8847 leveldb.cpp:181] Compacted db in 1.264481ms I0627 16:04:20.675797 8847 leveldb.cpp:196] Created db iterator in 89655ns I0627 16:04:20.675829 8847 leveldb.cpp:202] Seeked to beginning of db in 5551ns I0627 16:04:20.675848 8847 leveldb.cpp:271] Iterated through 0 keys in the db in 1133ns I0627 16:04:20.676103 8847 replica.cpp:779] Replica recovered with log positions 0 -> 0 with 1 holes and 0 unlearned I0627 16:04:20.680465 8873 recover.cpp:451] Starting replica recovery I0627 16:04:20.681649 8873 recover.cpp:477] Replica is in EMPTY status I0627 16:04:20.682160 8847 local.cpp:272] Creating default 'local' authorizer I0627 16:04:20.684504 8884 replica.cpp:676] Replica in EMPTY status received a broadcasted recover request from __req_res__(1)@172.17.0.3:37113 I0627 16:04:20.685750 8882 recover.cpp:197] Received a recover response from a replica in EMPTY status I0627 16:04:20.686617 8877 recover.cpp:568] Updating replica status to STARTING I0627 16:04:20.688508 8877 leveldb.cpp:304] Persisting metadata (8 bytes) to leveldb took 741914ns I0627 16:04:20.688544 8881 master.cpp:438] Master c1c3a180-5bd3-42fa-b84e-e2c30aba7364 (089cb2cc2625) started on 172.17.0.3:37113 I0627 16:04:20.688551 8877 replica.cpp:322] Persisted replica status to STARTING I0627 16:04:20.689095 8878 recover.cpp:477] Replica is in STARTING status I0627 16:04:20.688582 8881 master.cpp:440] Flags at startup: --acls="permissive: true register_frameworks { principals { type: ANY } roles { type: SOME values: "test" } } " --agent_ping_timeout="15secs" --agent_reregister_timeout="10mins" --allocation_interval="1secs" --allocator="HierarchicalDRF" --authenticate_agents="false" --authenticate_frameworks="false" --authenticate_http_frameworks="false" --authenticate_http_readonly="false" --authenticate_http_readwrite="false" --authenticators="crammd5" --authorizers="local" --credentials="/tmp/ExamplesTest_DynamicReservationFramework_uPVIaN/credentials" --filter_gpu_resources="true" --framework_sorter="drf" --help="false" --hostname_lookup="true" --http_authenticators="basic" --initialize_driver_logging="true" --log_auto_initialize="true" --logbufsecs="0" --logging_level="INFO" --max_agent_ping_timeouts="5" --max_completed_frameworks="50" --max_completed_tasks_per_framework="1000" --max_unreachable_tasks_per_framework="1000" --port="5050" --quiet="false" --recovery_agent_removal_limit="100%" --registry="replicated_log" --registry_fetch_timeout="1mins" --registry_gc_interval="15mins" --registry_max_agent_age="2weeks" --registry_max_agent_count="102400" --registry_store_timeout="20secs" --registry_strict="false" --root_submissions="true" --user_sorter="drf" --version="false" --webui_dir="/mesos/mesos-1.4.0/src/webui" --work_dir="/tmp/mesos-9H3Est/master" --zk_session_timeout="10secs" I0627 16:04:20.689460 8881 master.cpp:492] Master allowing unauthenticated frameworks to register I0627 16:04:20.689476 8881 master.cpp:506] Master allowing unauthenticated agents to register I0627 16:04:20.689482 8881 master.cpp:520] Master allowing HTTP frameworks to register without authentication I0627 16:04:20.689494 8881 credentials.hpp:37] Loading credentials for authentication from '/tmp/ExamplesTest_DynamicReservationFramework_uPVIaN/credentials' W0627 16:04:20.689620 8881 credentials.hpp:52] Permissions on credentials file '/tmp/ExamplesTest_DynamicReservationFramework_uPVIaN/credentials' are too open; it is recommended that your credentials file is NOT accessible by others I0627 16:04:20.689817 8881 master.cpp:562] Using default 'crammd5' authenticator I0627 16:04:20.690021 8881 authenticator.cpp:520] Initializing server SASL I0627 16:04:20.690615 8878 replica.cpp:676] Replica in STARTING status received a broadcasted recover request from __req_res__(2)@172.17.0.3:37113 I0627 16:04:20.691058 8877 recover.cpp:197] Received a recover response from a replica in STARTING status I0627 16:04:20.691951 8885 recover.cpp:568] Updating replica status to VOTING I0627 16:04:20.692548 8881 auxprop.cpp:73] Initialized in-memory auxiliary property plugin I0627 16:04:20.692679 8881 master.cpp:642] Authorization enabled I0627 16:04:20.692991 8872 leveldb.cpp:304] Persisting metadata (8 bytes) to leveldb took 525491ns I0627 16:04:20.693245 8872 replica.cpp:322] Persisted replica status to VOTING I0627 16:04:20.693567 8882 recover.cpp:582] Successfully joined the Paxos group I0627 16:04:20.693084 8876 hierarchical.cpp:169] Initialized hierarchical allocator process I0627 16:04:20.693971 8882 recover.cpp:466] Recover process terminated I0627 16:04:20.693136 8885 whitelist_watcher.cpp:77] No whitelist given I0627 16:04:20.695565 8847 resolver.cpp:69] Creating default secret resolver I0627 16:04:20.696056 8847 containerizer.cpp:230] Using isolation: filesystem/posix,posix/cpu,posix/mem,network/cni,environment_secret W0627 16:04:20.697029 8847 backend.cpp:76] Failed to create 'aufs' backend: AufsBackend requires root privileges W0627 16:04:20.697257 8847 backend.cpp:76] Failed to create 'bind' backend: BindBackend requires root privileges I0627 16:04:20.697340 8847 provisioner.cpp:255] Using default backend 'copy' I0627 16:04:20.702487 8878 slave.cpp:249] Mesos agent started on (1)@172.17.0.3:37113 I0627 16:04:20.702535 8878 slave.cpp:250] Flags at startup: --acls="permissive: true register_frameworks { principals { type: ANY } roles { type: SOME values: "test" } } " --appc_simple_discovery_uri_prefix="http://" --appc_store_dir="/tmp/mesos/store/appc" --authenticate_http_readonly="false" --authenticate_http_readwrite="false" --authenticatee="crammd5" --authentication_backoff_factor="1secs" --authorizer="local" --cgroups_cpu_enable_pids_and_tids_count="false" --cgroups_enable_cfs="false" --cgroups_hierarchy="/sys/fs/cgroup" --cgroups_limit_swap="false" --cgroups_root="mesos" --container_disk_watch_interval="15secs" --containerizers="mesos" --default_role="*" --disk_watch_interval="1mins" --docker="docker" --docker_kill_orphans="true" --docker_registry="https://registry-1.docker.io" --docker_remove_delay="6hrs" --docker_socket="/var/run/docker.sock" --docker_stop_timeout="0ns" --docker_store_dir="/tmp/mesos/store/docker" --docker_volume_checkpoint_dir="/var/run/mesos/isolators/docker/volume" --enforce_container_disk_quota="false" --executor_registration_timeout="1mins" --executor_reregistration_timeout="2secs" --executor_shutdown_grace_period="5secs" --fetcher_cache_dir="/tmp/mesos-9H3Est/agents/0/fetch" --fetcher_cache_size="2GB" --frameworks_home="" --gc_delay="1weeks" --gc_disk_headroom="0.1" --hadoop_home="" --help="false" --hostname_lookup="true" --http_command_executor="false" --http_heartbeat_interval="30secs" --initialize_driver_logging="true" --isolation="filesystem/posix,posix/cpu,posix/mem" --launcher="posix" --launcher_dir="/mesos/mesos-1.4.0/_build/src" --logbufsecs="0" --logging_level="INFO" --max_completed_executors_per_framework="150" --oversubscribed_resources_interval="15secs" --perf_duration="10secs" --perf_interval="1mins" --port="5051" --qos_correction_interval_min="0ns" --quiet="false" --recover="reconnect" --recovery_timeout="15mins" --registration_backoff_factor="1secs" --revocable_cpu_low_priority="true" --runtime_dir="/tmp/mesos-9H3Est/agents/0/run" --sandbox_directory="/mnt/mesos/sandbox" --strict="true" --switch_user="true" --systemd_enable_support="true" --systemd_runtime_directory="/run/systemd/system" --version="false" --work_dir="/tmp/mesos-9H3Est/agents/0/work" I0627 16:04:20.706677 8847 resolver.cpp:69] Creating default secret resolver I0627 16:04:20.707231 8847 containerizer.cpp:230] Using isolation: filesystem/posix,posix/cpu,posix/mem,network/cni,environment_secret W0627 16:04:20.708060 8847 backend.cpp:76] Failed to create 'aufs' backend: AufsBackend requires root privileges W0627 16:04:20.708179 8847 backend.cpp:76] Failed to create 'bind' backend: BindBackend requires root privileges I0627 16:04:20.708231 8847 provisioner.cpp:255] Using default backend 'copy' I0627 16:04:20.710345 8881 master.cpp:2161] Elected as the leading master! I0627 16:04:20.710400 8881 master.cpp:1700] Recovering from registrar I0627 16:04:20.710644 8886 registrar.cpp:345] Recovering registrar I0627 16:04:20.711997 8881 log.cpp:536] Attempting to start the writer I0627 16:04:20.714257 8880 slave.cpp:249] Mesos agent started on (2)@172.17.0.3:37113 I0627 16:04:20.714516 8880 slave.cpp:250] Flags at startup: --acls="permissive: true register_frameworks { principals { type: ANY } roles { type: SOME values: "test" } } " --appc_simple_discovery_uri_prefix="http://" --appc_store_dir="/tmp/mesos/store/appc" --authenticate_http_readonly="false" --authenticate_http_readwrite="false" --authenticatee="crammd5" --authentication_backoff_factor="1secs" --authorizer="local" --cgroups_cpu_enable_pids_and_tids_count="false" --cgroups_enable_cfs="false" --cgroups_hierarchy="/sys/fs/cgroup" --cgroups_limit_swap="false" --cgroups_root="mesos" --container_disk_watch_interval="15secs" --containerizers="mesos" --default_role="*" --disk_watch_interval="1mins" --docker="docker" --docker_kill_orphans="true" --docker_registry="https://registry-1.docker.io" --docker_remove_delay="6hrs" --docker_socket="/var/run/docker.sock" --docker_stop_timeout="0ns" --docker_store_dir="/tmp/mesos/store/docker" --docker_volume_checkpoint_dir="/var/run/mesos/isolators/docker/volume" --enforce_container_disk_quota="false" --executor_registration_timeout="1mins" --executor_reregistration_timeout="2secs" --executor_shutdown_grace_period="5secs" --fetcher_cache_dir="/tmp/mesos-9H3Est/agents/1/fetch" --fetcher_cache_size="2GB" --frameworks_home="" --gc_delay="1weeks" --gc_disk_headroom="0.1" --hadoop_home="" --help="false" --hostname_lookup="true" --http_command_executor="false" --http_heartbeat_interval="30secs" --initialize_driver_logging="true" --isolation="filesystem/posix,posix/cpu,posix/mem" --launcher="posix" --launcher_dir="/mesos/mesos-1.4.0/_build/src" --logbufsecs="0" --logging_level="INFO" --max_completed_executors_per_framework="150" --oversubscribed_resources_interval="15secs" --perf_duration="10secs" --perf_interval="1mins" --port="5051" --qos_correction_interval_min="0ns" --quiet="false" --recover="reconnect" --recovery_timeout="15mins" --registration_backoff_factor="1secs" --revocable_cpu_low_priority="true" --runtime_dir="/tmp/mesos-9H3Est/agents/1/run" --sandbox_directory="/mnt/mesos/sandbox" --strict="true" --switch_user="true" --systemd_enable_support="true" --systemd_runtime_directory="/run/systemd/system" --version="false" --work_dir="/tmp/mesos-9H3Est/agents/1/work" I0627 16:04:20.715459 8884 replica.cpp:496] Replica received implicit promise request from __req_res__(3)@172.17.0.3:37113 with proposal 1 I0627 16:04:20.716228 8884 leveldb.cpp:304] Persisting metadata (8 bytes) to leveldb took 693410ns I0627 16:04:20.716259 8884 replica.cpp:344] Persisted promised to 1 I0627 16:04:20.717954 8875 coordinator.cpp:238] Coordinator attempting to fill missing positions I0627 16:04:20.718142 8847 resolver.cpp:69] Creating default secret resolver I0627 16:04:20.718848 8847 containerizer.cpp:230] Using isolation: filesystem/posix,posix/cpu,posix/mem,network/cni,environment_secret W0627 16:04:20.719561 8847 backend.cpp:76] Failed to create 'aufs' backend: AufsBackend requires root privileges W0627 16:04:20.719844 8847 backend.cpp:76] Failed to create 'bind' backend: BindBackend requires root privileges I0627 16:04:20.719885 8847 provisioner.cpp:255] Using default backend 'copy' I0627 16:04:20.721727 8872 replica.cpp:391] Replica received explicit promise request from __req_res__(4)@172.17.0.3:37113 for position 0 with proposal 2 I0627 16:04:20.722569 8872 leveldb.cpp:341] Persisting action (8 bytes) to leveldb took 685913ns I0627 16:04:20.722607 8872 replica.cpp:711] Persisted action NOP at position 0 I0627 16:04:20.724098 8872 slave.cpp:249] Mesos agent started on (3)@172.17.0.3:37113 I0627 16:04:20.724123 8872 slave.cpp:250] Flags at startup: --acls="permissive: true register_frameworks { principals { type: ANY } roles { type: SOME values: "test" } } " --appc_simple_discovery_uri_prefix="http://" --appc_store_dir="/tmp/mesos/store/appc" --authenticate_http_readonly="false" --authenticate_http_readwrite="false" --authenticatee="crammd5" --authentication_backoff_factor="1secs" --authorizer="local" --cgroups_cpu_enable_pids_and_tids_count="false" --cgroups_enable_cfs="false" --cgroups_hierarchy="/sys/fs/cgroup" --cgroups_limit_swap="false" --cgroups_root="mesos" --container_disk_watch_interval="15secs" --containerizers="mesos" --default_role="*" --disk_watch_interval="1mins" --docker="docker" --docker_kill_orphans="true" --docker_registry="https://registry-1.docker.io" --docker_remove_delay="6hrs" --docker_socket="/var/run/docker.sock" --docker_stop_timeout="0ns" --docker_store_dir="/tmp/mesos/store/docker" --docker_volume_checkpoint_dir="/var/run/mesos/isolators/docker/volume" --enforce_container_disk_quota="false" --executor_registration_timeout="1mins" --executor_reregistration_timeout="2secs" --executor_shutdown_grace_period="5secs" --fetcher_cache_dir="/tmp/mesos-9H3Est/agents/2/fetch" --fetcher_cache_size="2GB" --frameworks_home="" --gc_delay="1weeks" --gc_disk_headroom="0.1" --hadoop_home="" --help="false" --hostname_lookup="true" --http_command_executor="false" --http_heartbeat_interval="30secs" --initialize_driver_logging="true" --isolation="filesystem/posix,posix/cpu,posix/mem" --launcher="posix" --launcher_dir="/mesos/mesos-1.4.0/_build/src" --logbufsecs="0" --logging_level="INFO" --max_completed_executors_per_framework="150" --oversubscribed_resources_interval="15secs" --perf_duration="10secs" --perf_interval="1mins" --port="5051" --qos_correction_interval_min="0ns" --quiet="false" --recover="reconnect" --recovery_timeout="15mins" --registration_backoff_factor="1secs" --revocable_cpu_low_priority="true" --runtime_dir="/tmp/mesos-9H3Est/agents/2/run" --sandbox_directory="/mnt/mesos/sandbox" --strict="true" --switch_user="true" --systemd_enable_support="true" --systemd_runtime_directory="/run/systemd/system" --version="false" --work_dir="/tmp/mesos-9H3Est/agents/2/work" I0627 16:04:20.725759 8847 sched.cpp:232] Version: 1.4.0 I0627 16:04:20.727453 8877 replica.cpp:540] Replica received write request for position 0 from __req_res__(5)@172.17.0.3:37113 I0627 16:04:20.727560 8877 leveldb.cpp:436] Reading position from leveldb took 65766ns I0627 16:04:20.727771 8884 sched.cpp:336] New master detected at master@172.17.0.3:37113 I0627 16:04:20.728091 8877 leveldb.cpp:341] Persisting action (14 bytes) to leveldb took 432923ns I0627 16:04:20.728114 8877 replica.cpp:711] Persisted action NOP at position 0 I0627 16:04:20.728425 8884 sched.cpp:352] No credentials provided. Attempting to register without authentication I0627 16:04:20.728458 8884 sched.cpp:836] Sending SUBSCRIBE call to master@172.17.0.3:37113 I0627 16:04:20.729207 8876 replica.cpp:694] Replica received learned notice for position 0 from @0.0.0.0:0 I0627 16:04:20.729523 8884 sched.cpp:869] Will retry registration in 1.548772469secs if necessary I0627 16:04:20.729578 8887 master.cpp:1523] Dropping 'mesos.scheduler.Call' message since not recovered yet I0627 16:04:20.729859 8876 leveldb.cpp:341] Persisting action (16 bytes) to leveldb took 594682ns I0627 16:04:20.729890 8876 replica.cpp:711] Persisted action NOP at position 0 I0627 16:04:20.730602 8882 log.cpp:552] Writer started with ending position 0 I0627 16:04:20.704807 8878 slave.cpp:553] Agent resources: [{"name":"cpus","scalar":{"value":16.0},"type":"SCALAR"},{"name":"mem","scalar":{"value":47270.0},"type":"SCALAR"},{"name":"disk","scalar":{"value":3701220.0},"type":"SCALAR"},{"name":"ports","ranges":{"range":[{"begin":31000,"end":32000}]},"type":"RANGES"}] I0627 16:04:20.716545 8880 slave.cpp:553] Agent resources: [{"name":"cpus","scalar":{"value":16.0},"type":"SCALAR"},{"name":"mem","scalar":{"value":47270.0},"type":"SCALAR"},{"name":"disk","scalar":{"value":3701220.0},"type":"SCALAR"},{"name":"ports","ranges":{"range":[{"begin":31000,"end":32000}]},"type":"RANGES"}] I0627 16:04:20.732960 8880 slave.cpp:561] Agent attributes: [ ] I0627 16:04:20.733045 8878 slave.cpp:561] Agent attributes: [ ] I0627 16:04:20.730206 8872 slave.cpp:553] Agent resources: [{"name":"cpus","scalar":{"value":16.0},"type":"SCALAR"},{"name":"mem","scalar":{"value":47270.0},"type":"SCALAR"},{"name":"disk","scalar":{"value":3701220.0},"type":"SCALAR"},{"name":"ports","ranges":{"range":[{"begin":31000,"end":32000}]},"type":"RANGES"}] I0627 16:04:20.733058 8880 slave.cpp:566] Agent hostname: 089cb2cc2625 I0627 16:04:20.733126 8872 slave.cpp:561] Agent attributes: [ ] I0627 16:04:20.733155 8872 slave.cpp:566] Agent hostname: 089cb2cc2625 I0627 16:04:20.733086 8878 slave.cpp:566] Agent hostname: 089cb2cc2625 I0627 16:04:20.733321 8879 leveldb.cpp:436] Reading position from leveldb took 79459ns I0627 16:04:20.733381 8887 status_update_manager.cpp:177] Pausing sending status updates I0627 16:04:20.733424 8887 status_update_manager.cpp:177] Pausing sending status updates I0627 16:04:20.733777 8887 status_update_manager.cpp:177] Pausing sending status updates I0627 16:04:20.737633 8885 state.cpp:64] Recovering state from '/tmp/mesos-9H3Est/agents/2/work/meta' I0627 16:04:20.738603 8874 status_update_manager.cpp:203] Recovering status update manager *** Error in `/mesos/mesos-1.4.0/_build/src/.libs/lt-dynamic-reservation-framework': double free or corruption (fasttop): 0x00002b7eac002ae0 *** ../../src/tests/script.cpp:83: Failure Failed dynamic_reservation_framework_test.sh terminated with signal Aborted [ FAILED ] ExamplesTest.DynamicReservationFramework (2904 ms)
Attachments
Issue Links
- is blocked by
-
MESOS-8084 Double free corruption in tests due to parallel manipulation of signal and control handlers.
- Resolved