Uploaded image for project: 'Parquet'
  1. Parquet
  2. PARQUET-1698

[C++] Add reader option to pre-buffer entire serialized row group into memory

    XMLWordPrintableJSON

Details

    Description

      In some scenarios (example: reading datasets from Amazon S3), reading columns independently and allowing unbridled Read calls to the underlying file handle can yield suboptimal performance. In such cases, it may be preferable to first read the entire serialized row group into memory then deserialize the constituent columns from this

      Note that such an option would not be appropriate as a default behavior for all file handle types since low-selectivity reads (example: reading only 3 columns out of a file with 100 columns) will be suboptimal in some cases. I think it would be better for "high latency" file systems to opt into this option

      cc fsaintjacques bkietz apitrou

      Attachments

        Issue Links

          Activity

            People

              lidavidm David Li
              wesm Wes McKinney
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 1h
                  1h