Details
-
Improvement
-
Status: Closed
-
Major
-
Resolution: Fixed
-
None
-
None
-
Reviewed
-
typedbytes: datatypes should be derived from data
Description
FROM (
FROM src
SELECT TRANSFORM(src.key, src.value) ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.TypedBytesSerDe'
RECORDWRITER 'org.apache.hadoop.hive.contrib.util.typedbytes.TypedBytesRecordWriter'
USING '/bin/cat'
AS (tkey, tvalue) ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.TypedBytesSerDe'
RECORDREADER 'org.apache.hadoop.hive.contrib.util.typedbytes.TypedBytesRecordReader'
) tmap
INSERT OVERWRITE TABLE dest1 SELECT tkey, tvalue;
The output is interpreted as a string - however, it is assumed that the script is retuning string data.
It would be useful if the reader and the deserializer can be decoupled.
The record reader (TypedBytesRecordReader) will read the typed data (independent of the output schema)
and then convert it according to the output schema.