Details
-
Bug
-
Status: Patch Available
-
Major
-
Resolution: Unresolved
-
1.27.0, 2.0.0-M5
-
None
-
RHEL 8.10, macOS Sequoia 15
Description
dstiegli1 - There are actually two issues with time conversion in ExcelReader. I'm using the following sample set of data to illustrate both issues:
Date_Standard: "07/24/24", Date_Custom: "07/25/24", Time_Standard: "4:00:00 PM", Time_Custom: "16:00:00", Timestamp: "07/26/24 16:00:00"
1. When inferring the schema, all of the Avro types are incorrectly identified as strings:
{ "type":"record", "name":"nifiRecord", "namespace":"org.apache.nifi", "fields":[ {"name":"column_0","type":["string","null"]}, {"name":"column_1","type":["string","null"]}, {"name":"column_2","type":["string","null"]}, {"name":"column_3","type":["string","null"]}, {"name":"column_4","type":["string","null"]} ] }
The output consists solely of epoch strings; the fourth value, "-2208999600000", appears to be incorrect, since it isn't inline with the other values at all.
Inferred output:
[ { "column_0" : "1721793600000", "column_1" : "1721880000000", "column_2" : "1721851200000", "column_3" : "-2208999600000", "column_4" : "1722024000000" } ]
2. The second issue occurs when the Avro schema is provided:
{ "type": "record", "name": "nifiRecord", "namespace": "org.apache.nifi", "fields": [ { "name": "Date_Standard", "type": [ "null", { "type": "int", "logicalType": "date" } ] }, { "name": "Date_Custom", "type": [ "null", { "type": "int", "logicalType": "date" } ] }, { "name": "Time_Standard", "type": [ "null", { "type": "int", "logicalType": "time-millis" } ] }, { "name": "Time_Custom", "type": [ "null", { "type": "long", "logicalType": "time-micros" } ] }, { "name": "Timestamp", "type": [ "null", { "type": "long", "logicalType": "timestamp-millis" } ] } ] }
Conversion of the time-millis and time-micros fields both fail with errors similar to this:
• 18:01:24 EDT ERROR ConvertRecord[id=a098dabc-0191-1000-6d17-3aaa911b2130] Failed to process FlowFile [filename=test.xIsx]; will route to failure: org.apache.nifi.processor.exception.ProcessException: Could not parse incoming data - Caused by: org.apache.nifi.serialization.MalformedRecordException: Read next Record from Excel XLSX failed - Caused by: org.apache.nifi.serialization.record.util.IllegalTypeConversionException: Cannot convert value [Sun Dec 31 16:00:00 EST 1899] of type class java.util.Date to Time for field Time_Custom
Changing the failed types to "string" results in order to allow conversion of the other values results in an epoch output for those values:
[ { "Date_Standard" : "07/24/2024", "Date_Custom" : "07/25/2024", "Time_Standard" : "1721851200000", "Time_Custom" : "-2208999600000", "Timestamp" : "07/26/2024 16:00:00" } ]
Given the same data in both JSON and CSV formats, with corresponding Readers, both the inferred and schema-provided outputs are as expected. This appears to be an issue in ExcelReader.
I've attached the spreadsheet & a template of the NiFi flow that I've been troubleshooting this with.