Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Duplicate
-
Impala 2.3.0
-
None
Description
See the test case below:
impala-shell -q "DROP TABLE IF EXISTS tabtest"; impala-shell -q "CREATE TABLE tabtest(col1 string,col2 string) ROW FORMAT DELIMITED FIELDS TERMINATED by ','"; impala-shell -q 'INSERT OVERWRITE TABLE tabtest VALUES ("test", "\t\t\tTest"), ("test2", "Test\t\t\tTest"), ("test3", "test\tTest"), ("test4", "test,Test");'; impala-shell -o out.csv -q "SELECT * FROM tabtest" --output_delimiter="," -B cat out.csv
The output looks like below:
test,,,,Test test2,Test,,,Test test3,test,Test test4,test
So two issues I can see here:
1. When strings contain TABs, all tabs will be replaced by delimiter
2. If string contains delimiter, the data after the delimiter is lost (see "test4"). According to doc: http://www.cloudera.com/documentation/enterprise/latest/topics/impala_shell_options.html,
If an output value contains the delimiter character, that field is quoted and/or escaped
By looking at the underlining data:
hadoop fs -cat /user/hive/warehouse/tabtest/ba44046cba4c7c80-d6c4c08afd8c0cb0_1055158928_data.0. test, Test test2,Test Test test3,test Test test4,test,Test
Data is not stored properly, as they should be in quotes for those strings that contains delimter characters.
This is both data write as well as read/parse issue.
Attachments
Issue Links
- is related to
-
IMPALA-116 The impala shell should take of strings with tabs for output formatting.
- Resolved
-
IMPALA-1840 impala-shell always treats tab as column boundaries and adds delimiter
- Resolved