Details
-
Sub-task
-
Status: Resolved
-
Major
-
Resolution: Duplicate
-
None
-
None
-
None
Description
When performing reads sometimes this warning shows up on stdout or stderr:
"Encountered string containing invalid UTF-8 data while serializing protocol buffer. Strings must contain only UTF-8; use the 'bytes' type for raw bytes"
Apparently this is most likely due to nulls embedded in a string field.
Running in gdb and breaking on wire_format.cc line 1053 gets a useful stack. It looks like RemoteBlockReader::async_connect eventually makes a hadoop::hdfs::ClientOperationHeaderProto and VerifyUTF8String errors out.