Uploaded image for project: 'Apache Hudi'
  1. Apache Hudi
  2. HUDI-7267

csi will cause data loss during sql query

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • index

    Description

      from the picture, csi will use parquet chunk block meta calculate min/max value, and save it to mdt col stat. For complex cols, such as *info array<struct<name: string, age: int>>* , parquet meta will contain only `info.array.name`, `infor.array.age`, but hudi will only calculate `info` column, so this meta in mdt will be null.

      And if sql expression contain `IsNotNull(info)`, the file will all be skip.

      And consider common cols, which will be add in the future and old file will not contain this col, may cause some other question. So, make code logical clean, Check for null before evaluating the value:min/mav/nullValue.

      Attachments

        1. image-2023-12-28-13-29-15-943.png
          1.53 MB
          Knight Chess

        Issue Links

          Activity

            People

              Unassigned Unassigned
              knightchess Knight Chess
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated: