Details
-
Bug
-
Status: Closed
-
Blocker
-
Resolution: Fixed
-
1.1.4, 1.2.0
-
None
-
None
Description
This issue has been reported on the mailing list twice:
- http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Previously-working-job-fails-on-Flink-1-2-0-td11741.html
- http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Arrays-values-in-keyBy-td7530.html
The problem is the following: We are using just Key[].hashCode() to compute the hash when shuffling data. Java's default hashCode() implementation doesn't take the arrays contents into account, but the memory address.
This leads to different hash code on the sender and receiver side.
In Flink 1.1 this means that the data is shuffled randomly and not keyed, and in Flink 1.2 the keygroups code detect a violation of the hashing.
The proper fix of the problem would be to rely on Flink's TypeComparator class, which has a type-specific hashing function. But introducing this change would break compatibility with existing code.
I'll file a JIRA for the 2.0 changes for that fix.
For 1.2.1 and 1.3.0 we should at least reject arrays as keys.
Attachments
Issue Links
- is related to
-
FLINK-5875 Use TypeComparator.hash() instead of Object.hashCode() for keying in DataStream API
- Open
-
FLINK-16555 Preflight check for known unstable hashCodes.
- Closed
-
FLINK-5299 DataStream support for arrays as keys
- Closed
- links to