Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-13358

Add prefixes when writing files in S3 compatible object stores

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • Backend
    • ghx-label-10

    Description

      AWS - https://aws.amazon.com/blogs/big-data/best-practices-to-optimize-data-access-performance-from-amazon-emr-and-aws-glue-to-amazon-s3/

      Amazon S3 performance isn’t defined per bucket, but per prefix in a bucket. Your applications can achieve at least 3,500 PUT/COPY/POST/DELETE or 5,500 GET/HEAD requests per second per prefix in a bucket. Additionally, there are no limits to the number of prefixes in a bucket, so you can horizontally scale your read or write performance using parallelization. For example, if you create 10 prefixes in an S3 bucket to parallelize reads, you could scale your read performance to 55,000 read requests per second. You can similarly scale writes by writing data across multiple prefixes.
        

      Impala should also support calculating and using prefixes to write data/delete to object stores to make the subsequent scans faster.

      Once we enable this we can honour the iceberg table property - `write.object-storage.enabled` to use with iceberg tables.

      Refer - https://aws.amazon.com/blogs/big-data/best-practices-to-optimize-data-access-performance-from-amazon-emr-and-aws-glue-to-amazon-s3/

      https://www.dremio.com/blog/ensuring-high-performance-at-any-scale-with-apache-icebergs-object-store-file-layout/

       

       

      Attachments

        Activity

          People

            Unassigned Unassigned
            myloginid@gmail.com Manish Maheshwari
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: