Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
None
-
None
Description
I may be wrong but Row.hashCode in Beam 2.6.0 has nondeterministic behaviors.
Running 'ModifiedTPCHITCase' multiple times return different results. Logs show that Row(71).hashCode in Node X and Row(71).hashCode in Node Y return different hashCodes. The two Rows end up in different reducer tasks, resulting in wrong final outputs.
I added the following code in BeamKeyExtractor to work around this. It'd be nice later to remove the code and simply use Row.hashCode once the issue is resolved.
} else if (key instanceof Row)
{ // TODO: sth sth return Arrays.hashCode(((Row) key).getValues().toArray()); }else {
[NODE X]
Key of ValueInGlobalWindow{value=KV
, pane=PaneInfo.NO_FIRING} is [71]
INFO 10-[NODE Y]18 20:35:56,575 FileBlock:103 [TaskExecutor thread-1] - Write: 0 with ValueInGlobalWindow{value=KV
, pane=PaneInfo.NO_FIRING}
[NODE Y]
Key of ValueInGlobalWindow{value=KV
, pane=PaneInfo.NO_FIRING} is [71]
INFO 10-18 20:35:56,191 FileBlock:103 [TaskExecutor thread-1] - Write: 1 with ValueInGlobalWindow{value=KV
, pane=PaneInfo.NO_FIRING}