Details
-
New Feature
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
-
None
-
ghx-label-9
Description
The existing mask functions can only deal with ASCII characters. It will be very useful to provide mask functions that can deal with UTF-8 characters, or to improve current mask fuctions to deal with them as Hive does. Otherwise, Impala may leak information since we count each UTF-8 character as three. For example, if we want to mask the last two characters, Impala only masks the last UTF-8 character.
hive> select mask_last_n('SQL引擎', 2, 'x', 'x', 'x', 'x'); SQLxx impala> select mask_last_n('SQL引擎', 2, 'x', 'x', 'x', 'x'); SQL引�xx
Some common scenarios:
- Masking the last two UTF-8 characters of Chinese names.
- Show only the first several UTF-8 characters of Chinese addresses and mask all the remaining characters.
However, this depends on our BE support for UTF-8 strings.
Attachments
Issue Links
- is related to
-
IMPALA-5675 Support CHAR/VARCHAR length counted in number of UTF-8 characters, not bytes
- In Progress
- relates to
-
IMPALA-2019 Proper UTF-8 support in string functions
- Resolved
- links to