Details
-
Bug
-
Status: Open
-
Minor
-
Resolution: Unresolved
-
2.0.2, 2.1.3, 2.2.2, 2.3.0, 2.4.5, 3.0.1
-
None
Description
Spark literal string parsing does not properly escape backslashes or other special characters. This is an extension of this issue: https://issues.apache.org/jira/browse/SPARK-17647#
The issue is that depending on how spark.sql.parser.escapedStringLiterals is set, you will either be able to correctly get escaped backslashes in a string literal, but not escaped other special characters, OR, you can have correctly escaped other special characters, but not correctly escaped backslashes.
So you have to choose which configuration you care about more.
I have tested Spark versions 2.1, 2.2, 2.3, 2.4, and 3.0 and they all experience the issue:
# These do not return the expected backslash SET spark.sql.parser.escapedStringLiterals=false; SELECT '\\'; > \ (should return \\) SELECT 'hi\hi'; > hihi (should return hi\hi) # These are correctly escaped SELECT '\"'; > " SELECT '\''; > '
If I switch this:
# These now work SET spark.sql.parser.escapedStringLiterals=true; SELECT '\\'; > \\ SELECT 'hi\hi'; > hi\hi # These are now not correctly escaped SELECT '\"'; > \" (should return ") SELECT '\''; > \' (should return ' )
So basically we have to choose:
SET spark.sql.parser.escapedStringLiterals=false; if we want backslashes correctly escaped but not other special characters
SET spark.sql.parser.escapedStringLiterals=true; if we want other special characters correctly escaped but not backslashes