Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
1.0.0
-
None
-
None
Description
We encountered a problem when using Hive hook in production. Hive hook time out after 180 seconds when kafka server is closed unexpectedly. Setting properties zookeeper.connection.timeout.ms and zookeeper.session.timeout.ms can't solve the problem.
We found some warns in hive.log. Seems Some configurations for Kafka notification producer are invalid.
The configuration 'zookeeper.connection.timeout.ms' was supplied but isn't a known config. The configuration 'zookeeper.session.timeout.ms' was supplied but isn't a known config
When kafka server closed unexpectedly, Atlas hook throws TimeoutException caused by fail to update metadata.
org.apache.atlas.notification.NotificationException: java.util.concurrent.ExecutionException: org.apache.kafka.common.errors.TimeoutException: Failed to update metadata after 60000 ms. at org.apache.atlas.kafka.KafkaNotification.sendInternalToProducer(KafkaNotification.java:220) at org.apache.atlas.kafka.KafkaNotification.sendInternal(KafkaNotification.java:182) at org.apache.atlas.notification.AbstractNotification.send(AbstractNotification.java:89) at org.apache.atlas.hook.AtlasHook.notifyEntitiesInternal(AtlasHook.java:133) at org.apache.atlas.hook.AtlasHook.notifyEntities(AtlasHook.java:118) at org.apache.atlas.hook.AtlasHook.notifyEntities(AtlasHook.java:171) at org.apache.atlas.hive.hook.HiveHook.run(HiveHook.java:156) at org.apache.atlas.hive.hook.HiveHook.run(HiveHook.java:52) at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1804) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1424) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1208) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1198) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:220) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:172) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:383) at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:775) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:693) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:628) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.util.RunJar.run(RunJar.java:221) at org.apache.hadoop.util.RunJar.main(RunJar.java:136) Caused by: java.util.concurrent.ExecutionException: org.apache.kafka.common.errors.TimeoutException: Failed to update metadata after 60000 ms. at org.apache.kafka.clients.producer.KafkaProducer$FutureFailure.<init>(KafkaProducer.java:1124) at org.apache.kafka.clients.producer.KafkaProducer.doSend(KafkaProducer.java:823) at org.apache.kafka.clients.producer.KafkaProducer.send(KafkaProducer.java:760) at org.apache.kafka.clients.producer.KafkaProducer.send(KafkaProducer.java:648) at org.apache.atlas.kafka.KafkaNotification.sendInternalToProducer(KafkaNotification.java:197) ... 23 more Caused by: org.apache.kafka.common.errors.TimeoutException: Failed to update metadata after 60000 ms.
We propose to add property atlas.kafka.max.block.ms in atlas-application.properties to control how long KafkaProducer will block when failed to update metadata.