Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
1.4.6
-
None
-
None
Description
I am able to import utf-8 data (non-latin1) data successfully into HDFS via:
sqoop import --connect jdbc:mysql://host/db --username XX --password YY \
--mysql-delimiters \
--table MYSQL_SRC_TABLE --target-dir ${SQOOP_DIR_PREFIX}/mysql_table --direct
However, using
sqoop export --connect jdbc:mysql://host/db --username XX --password YY \
--mysql-delimiters \
--table MYSQL_DEST_TABLE --export-dir ${SQOOP_DIR_PREFIX}/mysql_table \
--direct
Cuts off the fields after the first non-latin1 character (eg a letter w/ an umlaut).
I tried other options like – --default-character-set=utf8, without success.
I was able to fix the problem with the following change:
Change https://svn.apache.org/repos/asf/sqoop/trunk/src/java/org/apache/sqoop/mapreduce/MySQLExportMapper.java, line 322 from
this.mysqlCharSet = MySQLUtils.MYSQL_DEFAULT_CHARSET;
to
this.mysqlCharSet = "utf-8";
Hope this helps