Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-17721

[C++][Gandiva] Expression Evaluation Performance Improvement using Mimalloc

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • None
    • C++ - Gandiva
    • None

    Description

      Arrow use jemalloc as default memory allocator. For some reason, I am going to use mimalloc instead. But there seems have big performance difference between two memory allocators.

      Here are my steps.

      I use simple compile options:

      -DCMAKE_BUILD_TYPE=debug
      -DARROW_JEMALLOC=OFF|ON
      -DARROW_MIMALLOC=ON|OFF
      -DARROW_GANDIVA=ON
      -DARROW_GANDIVA_STATIC_LIBSTDCPP=ON
      -DARROW_BUILD_TESTS=ON
      

       
      Then I write a simple case:

      #include <gtest/gtest.h>
      #include "arrow/memory_pool.h"
      #include "arrow/status.h"
      
      #include "gandiva/projector.h"
      #include "gandiva/tests/test_util.h"
      #include "gandiva/tree_expr_builder.h"
      
      #include <chrono>
      #include <iostream>
      
      namespace gandiva {
      
      using arrow::boolean;
      using arrow::date64;
      using arrow::int32;
      using arrow::int64;
      using arrow::utf8;
      
      class TestUtf8Perf : public ::testing::Test {
       public:
        void SetUp() { pool_ = arrow::default_memory_pool(); }
      
       protected:
        arrow::MemoryPool* pool_;
      };
      
      void TestPerf(int64_t char_length, int64_t num_records) {
        // schema for input fields
        auto field_a = field("a", utf8());
        auto schema = arrow::schema({field_a});
      
        // output fields
        auto res = field("res", utf8());
      
        auto node_a = TreeExprBuilder::MakeField(field_a);
        auto upper_a = TreeExprBuilder::MakeFunction("upper", {node_a}, utf8());
        auto expr = TreeExprBuilder::MakeExpression(upper_a, res);
      
        // Build a projector for the expressions.
        std::shared_ptr<Projector> projector;
        auto status = Projector::Make(schema, {expr}, TestConfiguration(), &projector);
        EXPECT_TRUE(status.ok()) << status.message();
      
        std::string val = std::string(char_length, 'a');
        arrow::StringBuilder builder;
        for (int i = 0; i < num_records; i++) {
          auto _ = builder.Append(val);
        }
        std::shared_ptr<arrow::StringArray> array_a;
        auto _ = builder.Finish(&array_a);
      
        // prepare input record batch
        auto in_batch = arrow::RecordBatch::Make(schema, num_records, {array_a});
      
        auto start_epoch = std::chrono::duration_cast<std::chrono::milliseconds>(
                               std::chrono::system_clock::now().time_since_epoch())
                               .count();
        // Evaluate expression
        arrow::ArrayVector outputs;
        status = projector->Evaluate(*in_batch, pool_, &outputs);
        EXPECT_TRUE(status.ok()) << status.message();
      
        std::cout << std::chrono::duration_cast<std::chrono::milliseconds>(
                         std::chrono::system_clock::now().time_since_epoch())
                             .count() -
                         start_epoch
                  << "ms" << std::endl;
      }
      TEST_F(TestUtf8Perf, TestMemoryAllocsPerf) {
        TestPerf(20, 10000);
        TestPerf(20, 100000);
        TestPerf(200, 10000);
        TestPerf(200, 100000);
        TestPerf(2000, 10000);
      }
      
      }  // namespace gandiva
      

      this case is going to calculate expression upper(a), a has different size with 20/200/2000. Evaluation time results are:

      char_length num_records Using Mimalloc (ms) Using Jemalloc(ms)
      20 10000 29 3
      20 100000 2686 26
      200 10000 954 11
      200 100000 220153 118
      2000 10000 21162 89

       
      Is this performance gap expected? Or any other compile options should I note? How to make performance better using mimalloc?

      Attachments

        Issue Links

          Activity

            People

              jinshang Jin Shang
              NinaPeng Jiangtao Peng
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: