Details
-
Bug
-
Status: Closed
-
Minor
-
Resolution: Cannot Reproduce
-
0.5.2
-
None
-
Ubuntu 16.04 4.4.0-146-generic
Hadoop 3.1.2
Hivemall 0.52
Description
Hello when attempting to use hive-malls regression tool kit I run into errors when attempting to build the feature representation.
I've been following this guide https://hivemall.incubator.apache.org/userguide/supervised_learning/tutorial.html and have been attempting to reproduce it. I've used the code provided however I'm running into issues when running.
My issue seems to be with this part of the guide
create table if not exists purchase_history as
select 1 as id, "Saturday" as day_of_week, "male" as gender, 600 as price, "book" as category, 1 as label
union all
select 2 as id, "Friday" as day_of_week, "female" as gender, 4800 as price, "sports" as category, 0 as label
union all
select 3 as id, "Friday" as day_of_week, "other" as gender, 18000 as price, "entertainment" as category, 0 as label
union all
select 4 as id, "Thursday" as day_of_week, "male" as gender, 200 as price, "food" as category, 0 as label
union all
select 5 as id, "Wednesday" as day_of_week, "female" as gender, 1000 as price, "electronics" as category, 1 as label
;
create table if not exists training as
select
id,
array_concat( – concatenate two arrays of quantitative and categorical features into single array
quantitative_features(
array("price"), – quantitative feature names
price – corresponding column names
),
categorical_features(
array("day of week", "gender", "category"), – categorical feature names
day_of_week, gender, category – corresponding column names
)
) as features,
label
from
purchase_history
;
This is copied straight from the guide. When running I am getting this error
:24,657 INFO [main] org.apache.hadoop.conf.Configuration.deprecation: mapred.task.id is deprecated. Instead, use mapreduce.task.attempt.id 2019-05-16 10:09:24,692 INFO [main] org.apache.hadoop.hive.ql.exec.FileSinkOperator: Using serializer : class org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe[[[B@43b0ade]:[id, features, label]:[int, array<string>, int]] and formatter : org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat@2d66530f 2019-05-16 10:09:24,692 INFO [main] org.apache.hadoop.conf.Configuration.deprecation: mapred.healthChecker.script.timeout is deprecated. Instead, use mapreduce.tasktracker.healthchecker.script.timeout 2019-05-16 10:09:24,706 INFO [main] org.apache.hadoop.hive.ql.exec.Utilities: PLAN PATH = hdfs://localhost:9000/tmp/hive/jshaw6/970622c3-bfd6-407c-93f9-953184696ebf/hive_2019-05-16_10-09-11_357_6286825630727418123-1/-mr-10006/3f3f0199-3af0-40dd-abb4-6bad4df12ba7/map.xml 2019-05-16 10:09:24,745 ERROR [main] org.apache.hadoop.hive.ql.exec.mr.ExecMapper: Hit error while closing operators - failing tree 2019-05-16 10:09:24,746 WARN [main] org.apache.hadoop.mapred.YarnChild: Exception running child : java.lang.RuntimeException: Hive Runtime Error while closing operators at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.close(ExecMapper.java:211) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61) at org.apache.hadoop.hive.ql.exec.mr.ExecMapRunner.run(ExecMapRunner.java:37) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:465) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:349) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:174) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:168) Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Error evaluating id at org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.process(VectorSelectOperator.java:149) at org.apache.hadoop.hive.ql.exec.Operator.vectorForward(Operator.java:966) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:939) at org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:125) at org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.closeOp(VectorMapOperator.java:990) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:733) at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.close(ExecMapper.java:193) ... 9 more Caused by: java.lang.RuntimeException: org.apache.hadoop.hive.ql.exec.UDFArgumentException: argument must be a constant value: array<string> at org.apache.hadoop.hive.ql.exec.vector.udf.VectorUDFAdaptor.evaluate(VectorUDFAdaptor.java:106) at org.apache.hadoop.hive.ql.exec.vector.expressions.VectorExpression.evaluateChildren(VectorExpression.java:271) at org.apache.hadoop.hive.ql.exec.vector.udf.VectorUDFAdaptor.evaluate(VectorUDFAdaptor.java:111) at org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.process(VectorSelectOperator.java:146) ... 15 more Caused by: org.apache.hadoop.hive.ql.exec.UDFArgumentException: argument must be a constant value: array<string> at hivemall.utils.hadoop.HiveUtils.getConstStringArray(HiveUtils.java:502) at hivemall.ftvec.trans.QuantitativeFeaturesUDF.initialize(QuantitativeFeaturesUDF.java:80) at org.apache.hadoop.hive.ql.exec.vector.udf.VectorUDFAdaptor.init(VectorUDFAdaptor.java:89) at org.apache.hadoop.hive.ql.exec.vector.udf.VectorUDFAdaptor.evaluate(VectorUDFAdaptor.java:104) ... 18 more
However when I run the same query alone, without creating a table, I get the right results.
1 "price:600.0","day of week#Saturday","gender#male","category#book" 1
2 "price:4800.0","day of week#Friday","gender#female","category#sports"0
3 "price:18000.0","day of week#Friday","gender#other","category#entertainment" 0
4 "price:200.0","day of week#Thursday","gender#male","category#food" 0
5 "price:1000.0","day of week#Wednesday","gender#female","category#electronics" 1
Any idea why I am not able to save this information in a table?