Uploaded image for project: 'Traffic Server'
  1. Traffic Server
  2. TS-2302

log collation causes error logging to stop

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 4.1.2
    • Logging, Tests
    • None

    Description

      When we started using the log collation ATS stopped writing to the errors.log file. Traffic Server thinks that the configured log space is exhausted:

      [Sep 18 23:00:40.817] Server {0x2b1bdcc51700} WARNING: Access logging to local log directory suspended - configured space allocation exhausted.
      [Sep 18 23:01:45.008] Server {0x2b05c4e93700} WARNING: Access logging to local log directory suspended - configured space allocation exhausted.
      [Sep 19 20:57:05.766] Server {0x2b05c4c91700} NOTE: logging space exhausted, failed to write orphan file, drop(1832) bytes
      [Sep 19 20:57:15.743] Server {0x2b05c4c91700} NOTE: logging space exhausted, failed to write orphan file, drop(4360) bytes
      

      LogConfig::update_space_used() emits the exhaustion message when LogConfig::space_to_write(headroom) fails:

      $ sudo /opt/bazinga/bin/traffic_line -r proxy.config.log.max_space_mb_headroom
      1000
      $ sudo gdb -p 28375
      ...
      (gdb) p (bool)(Log::config->space_to_write((int64_t)(1000*1024)))
      $1 = false
      
      (gdb) p *(Log::config)
      $1 = {initialized = true, reconfiguration_needed = false, logging_space_exhausted = true, m_space_used = 28313954, m_partition_space_left = 46870138880, 
        roll_log_files_now = false, log_object_manager = {_objects = 0x2b1b68758d00, _numObjects = 1, _maxObjects = 8, _APIobjects = 0x2b1b68758d50, _numAPIobjects = 0, 
          _maxAPIobjects = 8, _APImutex = 0x2b1b6874ee40}, global_filter_list = {m_filter_list = {<DLL<LogFilter, LogFilter::Link_link>> = {head = 0x0}, tail = 0x0}, 
          m_does_conjunction = true}, global_format_list = {m_format_list = {<DLL<LogFormat, LogFormat::Link_link>> = {head = 0x2b1b68764810}, tail = 0x2b1b68767e10}}, 
        log_buffer_size = 9216, max_secs_per_buffer = 5, max_space_mb_for_logs = 102400, max_space_mb_for_orphan_logs = 25, max_space_mb_headroom = 1000, 
        logfile_perm = 420, squid_log_enabled = false, xuid_logging_enabled = true, squid_log_is_ascii = false, common_log_enabled = false, common_log_is_ascii = true, 
        extended_log_enabled = false, extended_log_is_ascii = false, extended2_log_enabled = false, extended2_log_is_ascii = true, separate_icp_logs = false, 
        separate_host_logs = false, collation_mode = 2, collation_port = 8085, collation_host_tagged = false, collation_preproc_threads = 1, collation_retry_sec = 5, 
        collation_max_send_buffers = 16, rolling_enabled = 2, rolling_interval_sec = 86400, rolling_offset_hr = 0, rolling_size_mb = 10240, 
        auto_delete_rolled_files = true, custom_logs_enabled = true, search_log_enabled = false, search_rolling_interval_sec = 86400, search_server_ip_addr = 0, 
        search_server_port = 8080, search_top_sites = 100, search_log_filters = 0x0, search_url_filter = 0x0, search_log_file_one = 0x2b1b68758e00 "search_log1", 
        search_log_file_two = 0x2b1b68758e20 "search_log2", sampling_frequency = 1, file_stat_frequency = 32, space_used_frequency = 2, ascii_buffer_size = 36864, 
        max_line_size = 9216, hostname = 0x2b1b68758f20 "localhost", logfile_dir = 0x2b1b6823fcd0 "/var/log/trafficserver", squid_log_name = 0x2b1b68240960 "squid", 
        squid_log_header = 0x0, common_log_name = 0x2b1b68225a90 "common", common_log_header = 0x0, extended_log_name = 0x2b1b682419f0 "extended", 
        extended_log_header = 0x0, extended2_log_name = 0x2b1b6874f7b0 "extended2", extended2_log_header = 0x0, collation_host = 0x2b1b6874fe50 "none", 
        collation_secret = 0x2b1b6874fa20 "Aapl_Baz1ng4!", xml_config_file = 0x2b1b68758da0 "logs_xml.config", hosts_config_file = 0x2b1b68758dc0 "log_hosts.config", 
        use_orphan_log_space_value = true, m_log_collation_accept = 0x0, m_dir_entry = 0x2b1b68769e30, m_pDir = 0x0, m_disk_full = true, m_disk_low = false, 
        m_partition_full = false, m_partition_low = false, m_log_directory_inaccessible = false}
      

      Note that use_orphan_log_space_value = true.

      In proxy/logging/LogConfig.h:

         169  
         170    int get_max_space_mb() {
         171      return (use_orphan_log_space_value ? max_space_mb_for_orphan_logs : max_space_mb_for_logs);
         172    }
         173  
      

      So the orphan log space is being used to determine the maximum log space allowed. This is determined in LogConfig::init():

         742    // determine if we should use the orphan log space value or not
         743    // we use it if all objects are collation clients, or if some are and
         744    // the specified space for collation is larger than that for local files
         745    //
         746    size_t num_collation_clients = log_object_manager.get_num_collation_clients();
         747    use_orphan_log_space_value =
         748      (num_collation_clients == 0 ? false :
         749       (log_object_manager.get_num_objects() == num_collation_clients ? true :
         750        max_space_mb_for_orphan_logs > max_space_mb_for_logs));
         751  
      

      Our configuration has 1 log object that is a collation client, so log_object_manager.get_num_objects() == log_object_manager.get_num_collation_clients(), and we end up using the orphan log space.

      The fix is to treat the error.log like a standard log object and manage it like all the others. This makes the assumptions around logging thresholds turn out to be true.

      Attachments

        Activity

          People

            jamespeach James Peach
            jamespeach James Peach
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: