Uploaded image for project: 'Apache NiFi'
  1. Apache NiFi
  2. NIFI-13744

ExcelReader time conversion issues

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Patch Available
    • Major
    • Resolution: Unresolved
    • 1.27.0, 2.0.0-M5
    • None
    • Core Framework
    • RHEL 8.10, macOS Sequoia 15

    Description

      dstiegli1 - There are actually two issues with time conversion in ExcelReader.  I'm using the following sample set of data to illustrate both issues:

      Date_Standard: "07/24/24",
      Date_Custom: "07/25/24",
      Time_Standard: "4:00:00 PM",
      Time_Custom: "16:00:00",
      Timestamp: "07/26/24 16:00:00"

      1. When inferring the schema, all of the Avro types are incorrectly identified as strings:

      {
        "type":"record",
        "name":"nifiRecord",
        "namespace":"org.apache.nifi",
        "fields":[
          {"name":"column_0","type":["string","null"]},
          {"name":"column_1","type":["string","null"]},
          {"name":"column_2","type":["string","null"]},
          {"name":"column_3","type":["string","null"]},
          {"name":"column_4","type":["string","null"]}
        ]
      }

      The output consists solely of epoch strings; the fourth value, "-2208999600000", appears to be incorrect, since it isn't inline with the other values at all.

      Inferred output:

      [ {
        "column_0" : "1721793600000",
        "column_1" : "1721880000000",
        "column_2" : "1721851200000",
        "column_3" : "-2208999600000",
        "column_4" : "1722024000000"
      } ]

       

      2. The second issue occurs when the Avro schema is provided:

      {
        "type": "record",
        "name": "nifiRecord",
        "namespace": "org.apache.nifi",
        "fields": [
          {
            "name": "Date_Standard",
            "type": [ "null", { "type": "int", "logicalType": "date" } ]
          },
          {
            "name": "Date_Custom",
            "type": [ "null", { "type": "int", "logicalType": "date" } ]
          },
          {
            "name": "Time_Standard",
            "type": [ "null", { "type": "int", "logicalType": "time-millis" } ]
          },
          {
            "name": "Time_Custom",
            "type": [ "null", { "type": "long", "logicalType": "time-micros" } ]
          },
          {
            "name": "Timestamp",
            "type": [ "null", { "type": "long", "logicalType": "timestamp-millis" } ]
          }
        ]
      } 
      

      Conversion of the time-millis and time-micros fields both fail with errors similar to this:

      • 18:01:24 EDT ERROR
      ConvertRecord[id=a098dabc-0191-1000-6d17-3aaa911b2130] Failed to process FlowFile [filename=test.xIsx]; will route to failure: org.apache.nifi.processor.exception.ProcessException: Could not parse incoming data
      - Caused by: org.apache.nifi.serialization.MalformedRecordException: Read next Record from Excel XLSX failed
      - Caused by: org.apache.nifi.serialization.record.util.IllegalTypeConversionException: Cannot convert value [Sun Dec 31 16:00:00 EST 1899] of type class java.util.Date to Time for field Time_Custom
      

      Changing the failed types to "string" results in order to allow conversion of the other values results in an epoch output for those values:

      [ {
        "Date_Standard" : "07/24/2024",
        "Date_Custom" : "07/25/2024",
        "Time_Standard" : "1721851200000",
        "Time_Custom" : "-2208999600000",
        "Timestamp" : "07/26/2024 16:00:00"
      } ]

       

      Given the same data in both JSON and CSV formats, with corresponding Readers, both the inferred and schema-provided outputs are as expected.  This appears to be an issue in ExcelReader.

      I've attached the spreadsheet & a template of the NiFi flow that I've been troubleshooting this with.

      Attachments

        1. test.xlsx
          10 kB
          John Wise
        2. Excel_Reader_-_Time_Conversion_Issues.xml
          96 kB
          John Wise

        Activity

          People

            dstiegli1 Daniel Stieglitz
            john.wise John Wise
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:

              Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0h
                0h
                Logged:
                Time Spent - 1h 50m
                1h 50m