Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
None
-
None
Description
Hi,
I'm trying to launch gobblin-mapreduce.sh on my job config, that is almost copy/paste from your wiki https://github.com/linkedin/gobblin/wiki/Kafka-HDFS-Ingestion
I'm launching gobblin with command:
```
bin/gobblin-mapreduce.sh --conf jobs/dump-kafka.properties --workdir work/
```
But the job fails with the following repeated error in all mappers:
```
java.lang.NoClassDefFoundError: kafka/common/TopicAndPartition
at gobblin.source.extractor.extract.kafka.KafkaWrapper$KafkaOldAPI.createFetchRequest(KafkaWrapper.java:401)
at gobblin.source.extractor.extract.kafka.KafkaWrapper$KafkaOldAPI.fetchNextMessageBuffer(KafkaWrapper.java:333)
at gobblin.source.extractor.extract.kafka.KafkaWrapper.fetchNextMessageBuffer(KafkaWrapper.java:136)
at gobblin.source.extractor.extract.kafka.KafkaExtractor.fetchNextMessageBuffer(KafkaExtractor.java:239)
at gobblin.source.extractor.extract.kafka.KafkaExtractor.readRecordImpl(KafkaExtractor.java:125)
at gobblin.instrumented.extractor.InstrumentedExtractorBase.readRecord(InstrumentedExtractorBase.java:121)
at gobblin.instrumented.extractor.InstrumentedExtractor.readRecord(InstrumentedExtractor.java:34)
at gobblin.runtime.LimitingExtractorDecorator.readRecord(LimitingExtractorDecorator.java:69)
at gobblin.instrumented.extractor.InstrumentedExtractorDecorator.readRecordImpl(InstrumentedExtractorDecorator.java:64)
at gobblin.instrumented.extractor.InstrumentedExtractorDecorator.readRecord(InstrumentedExtractorDecorator.java:57)
at gobblin.runtime.Task.run(Task.java:169)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.ClassNotFoundException: kafka.common.TopicAndPartition
at java.net.URLClassLoader$1.run(URLClassLoader.java:372)
at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:360)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 14 more
```
It seems that gobblin does not include kafka (and other) jars in the mapreduce tasks's classpath.
I also tried to include all the jars in lib/ directory to libjars with command:
```
bin/gobblin-mapreduce.sh --conf jobs/dump-kafka.properties --workdir work/ --jars `ls lib/* | tr \n ,`
```
But this time, I get error of clashing guava libraries:
```
Error: java.lang.RuntimeException: java.lang.reflect.InvocationTargetException
at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:131)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:722)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1556)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:408)
at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:129)
... 7 more
Caused by: java.lang.NoSuchMethodError: com.google.common.collect.Sets.newConcurrentHashSet()Ljava/util/Set;
at gobblin.configuration.SourceState.<clinit>(SourceState.java:54)
at gobblin.runtime.mapreduce.MRJobLauncher$TaskRunner.<init>(MRJobLauncher.java:554)
... 12 more
```
I have hadoop 2.4.0, which uses guava 11.0.2, while the one in lib/ is guava-15.0.
Github Url : https://github.com/linkedin/gobblin/issues/386
Github Reporter : kzarzycki-advertine
Github Created At : 2015-10-15T07:29:37Z
Github Updated At : 2016-03-10T00:36:08Z
Comments
kzarzycki wrote on 2015-10-17T06:41:39Z : Hey, anyone has comments on this ticket? I'll be grateful for your help with this, Thank you!
Krzysztof
Github Url : https://github.com/linkedin/gobblin/issues/386#issuecomment-148891116
zliu41 wrote on 2015-10-19T19:05:02Z : Hi @kzarzycki seems the jars in `lib` were somehow not correctly added to the hadoop classpath. I couldn't repeat your errors (if you run `gobblin-mapreduce.sh` from the parent dir of `lib` it should automatically work), so I can only make some guesses. In `gobblin-mapreduce.sh` can you replace the line
`export HADOOP_CLASSPATH=$GOBBLIN_DEP_JARS:$HADOOP_CLASSPATH`
with one of the following:
```
export HADOOP_CLASSPATH=lib:$HADOOP_CLASSPATH
export HADOOP_CLASSPATH=lib
export HADOOP_CLASSPATH=.:$HADOOP_CLASSPATH
export HADOOP_CLASSPATH=.
```
Then run `gobblin-mapreduce.sh` with or without option `--jars [path-to-lib]`.
Not sure which combination is correct so you can try these options.
Github Url : https://github.com/linkedin/gobblin/issues/386#issuecomment-149314749
rsimiciuc wrote on 2015-11-02T15:09:58Z : I have the same problem. Any solution?
Github Url : https://github.com/linkedin/gobblin/issues/386#issuecomment-153047233
klyr wrote on 2015-11-17T09:30:12Z : Hi @kzarzycki-advertine,
I had the same problem and struggled a while to fix it.
In my case it was a problem with the hive-exec library embedding the (not shaded) guava library. It took precedence over the newer guava library.
Here is the related JIRA issue: https://issues.apache.org/jira/browse/HIVE-5733
A quick fix is to remove `hive-exec-0.13.1.jar` or not including it in the `--jars` option.
Upgrading to hive version > 1.2.0 may also work.
I hope it will help.
Github Url : https://github.com/linkedin/gobblin/issues/386#issuecomment-157318430
gilmichlin wrote on 2015-11-18T16:41:23Z : I can confirm it's I can reproduce on HDP 2.3.0
./gradlew clean build -PuseHadoop2 -PhadoopVersion=2.7.1 -PhiveVersion=1.2.1
upgrade to hive 1.2.1 did not work for me
just used:
--jars `ls lib/* | grep -v hive | tr \n ,`
and it was working
Github Url : https://github.com/linkedin/gobblin/issues/386#issuecomment-157771981
zliu41 wrote on 2015-11-18T16:58:19Z : @klyr @gilmichlin thanks for posting! I'll see if updating the hive version works.
Github Url : https://github.com/linkedin/gobblin/issues/386#issuecomment-157778024
zliu41 wrote on 2015-11-18T22:28:14Z : I've updated the hive version to 1.2.1. #466
Github Url : https://github.com/linkedin/gobblin/issues/386#issuecomment-157885351
gilmichlin wrote on 2015-11-18T22:37:41Z : 1.2.1 did not work for me with HDP 2.3.0 only the
--jars ls lib/* | grep -v hive | tr \n ,
Github Url : https://github.com/linkedin/gobblin/issues/386#issuecomment-157887533
zliu41 wrote on 2015-11-19T18:22:04Z : @gilmichlin is it still because of the Guava dependency? Based on HIVE-5733 it shouldn't be a problem with Hive 1.2.0 or later.
If so, is there any hive version that works for you?
Github Url : https://github.com/linkedin/gobblin/issues/386#issuecomment-158145837
gilmichlin wrote on 2015-11-20T18:46:14Z : I am going to check it out in the weekend
Github Url : https://github.com/linkedin/gobblin/issues/386#issuecomment-158488974
gilmichlin wrote on 2015-11-23T18:58:26Z : Still Guava
you will be able to reproduce by loading HDP 2.3.X VM build with the following:
```
./gradlew clean build -PuseHadoop2 -PhadoopVersion=2.7.1 -PhiveVersion=1.2.1
```
running the following wikipedia example
```
/bin/gobblin-mapreduce.sh --conf /opt/gobblin/job/wikipedia.pull --workdir /user/root/gobblin/ --jars `ls lib/* | tr \n ,`
```
will give the following error
```
2015-11-23 18:50:15,857 WARN [main] org.apache.hadoop.mapred.YarnChild: Exception running child : java.lang.RuntimeException: java.lang.reflect.InvocationTargetException
at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:134)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:747)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:132)
... 7 more
Caused by: java.lang.NoSuchMethodError: com.google.common.collect.Sets.newConcurrentHashSet()Ljava/util/Set;
at gobblin.configuration.SourceState.<clinit>(SourceState.java:54)
at gobblin.runtime.mapreduce.MRJobLauncher$TaskRunner.<init>(MRJobLauncher.java:525)
... 12 more
```
listing Hive jars
```
ls -l lib/ | grep hive
rw-rr- 1 root root 47713 2015-11-18 16:17 hive-ant-1.2.1.jar
rw-rr- 1 root root 292289 2015-11-18 16:17 hive-common-1.2.1.jar
rw-rr- 1 root root 20599029 2015-11-18 16:17 hive-exec-1.2.1.jar
rw-rr- 1 root root 100580 2015-11-18 16:17 hive-jdbc-1.2.1.jar
rw-rr- 1 root root 5505100 2015-11-18 16:17 hive-metastore-1.2.1.jar
rw-rr- 1 root root 916706 2015-11-18 16:17 hive-serde-1.2.1.jar
rw-rr- 1 root root 1878543 2015-11-18 16:17 hive-service-1.2.1.jar
rw-rr- 1 root root 32390 2015-11-18 16:17 hive-shims-0.20S-1.2.1.jar
rw-rr- 1 root root 60070 2015-11-18 16:17 hive-shims-0.23-1.2.1.jar
rw-rr- 1 root root 8949 2015-11-18 16:17 hive-shims-1.2.1.jar
rw-rr- 1 root root 108914 2015-11-18 16:17 hive-shims-common-1.2.1.jar
rw-rr- 1 root root 13065 2015-11-18 16:17 hive-shims-scheduler-1.2.1.jar
```
running like that would work
```
./bin/gobblin-mapreduce.sh --conf /opt/gobblin/job/wikipedia.pull --workdir /user/root/gobblin/ --jars `ls lib/* | grep -v hive | tr \n ,`
```
Github Url : https://github.com/linkedin/gobblin/issues/386#issuecomment-159027977
rsimiciuc wrote on 2015-11-23T19:11:31Z : I had the same problem with running gobblin on CDH5, but i managed to solve
it by shadowing guava
On Monday, 23 November 2015, gilmichlin notifications@github.com wrote:
> Still Guava
> you will be able to reproduce by loading HDP 2.3.X VM build with the
> following:
>
> ./gradlew clean build -PuseHadoop2 -PhadoopVersion=2.7.1 -PhiveVersion=1.2.1
>
> running the following wikipedia example
>
> /bin/gobblin-mapreduce.sh --conf /opt/gobblin/job/wikipedia.pull --workdir /user/root/gobblin/ --jars `ls lib/* | tr \n ,`
>
> will give the following error
>
> 2015-11-23 18:50:15,857 WARN [main] org.apache.hadoop.mapred.YarnChild: Exception running child : java.lang.RuntimeException: java.lang.reflect.InvocationTargetException
> at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:134)
> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:747)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
> at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
> Caused by: java.lang.reflect.InvocationTargetException
> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
> at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
> at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
> at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:132)
> ... 7 more
> Caused by: java.lang.NoSuchMethodError: com.google.common.collect.Sets.newConcurrentHashSet()Ljava/util/Set;
> at gobblin.configuration.SourceState.<clinit>(SourceState.java:54)
> at gobblin.runtime.mapreduce.MRJobLauncher$TaskRunner.<init>(MRJobLauncher.java:525)
> ... 12 more
>
> listing Hive jars
>
> ls -l lib/ | grep hive
> rw-rr- 1 root root 47713 2015-11-18 16:17 hive-ant-1.2.1.jar
> rw-rr- 1 root root 292289 2015-11-18 16:17 hive-common-1.2.1.jar
> rw-rr- 1 root root 20599029 2015-11-18 16:17 hive-exec-1.2.1.jar
> rw-rr- 1 root root 100580 2015-11-18 16:17 hive-jdbc-1.2.1.jar
> rw-rr- 1 root root 5505100 2015-11-18 16:17 hive-metastore-1.2.1.jar
> rw-rr- 1 root root 916706 2015-11-18 16:17 hive-serde-1.2.1.jar
> rw-rr- 1 root root 1878543 2015-11-18 16:17 hive-service-1.2.1.jar
> rw-rr- 1 root root 32390 2015-11-18 16:17 hive-shims-0.20S-1.2.1.jar
> rw-rr- 1 root root 60070 2015-11-18 16:17 hive-shims-0.23-1.2.1.jar
> rw-rr- 1 root root 8949 2015-11-18 16:17 hive-shims-1.2.1.jar
> rw-rr- 1 root root 108914 2015-11-18 16:17 hive-shims-common-1.2.1.jar
> rw-rr- 1 root root 13065 2015-11-18 16:17 hive-shims-scheduler-1.2.1.jar
>
> running like that would work
>
> ./bin/gobblin-mapreduce.sh --conf /opt/gobblin/job/wikipedia.pull --workdir /user/root/gobblin/ --jars `ls lib/* | grep -v hive | tr \n ,`
>
> —
> Reply to this email directly or view it on GitHub
> https://github.com/linkedin/gobblin/issues/386#issuecomment-159027977.
//R Mobile
Github Url : https://github.com/linkedin/gobblin/issues/386#issuecomment-159031634
x10ba wrote on 2015-12-05T01:32:53Z : Hi, I think my error is similar to this thread, so putting it here (not sure if I need to change my properties):
Exception in thread main java.lang.RuntimeException: java.lang.ClassNotFoundException: gobblin.source.extractor.extract.kafka.kafkaSimpleSource
Current sys:
centos
Invoke:
[bin]$ ./gobblin-mapreduce.sh --conf ~/gobblin/gobblin-dist/conf/gobblin-mapreduce.properties --workdir ~/gobblin/work --jars ~/gobblin/gobblin-dist/lib/gobblin-core.jar
kafkaSimpleSource lives in the gobblin-core.jar
thanks,
x10ba
Github Url : https://github.com/linkedin/gobblin/issues/386#issuecomment-162124443
qizongjun wrote on 2016-03-09T23:24:22Z : Anyone with luck on this? I am facing the same Kafka problem. I find kafka jar inside gobblin/lib there, and it contains TopicAndPartition.class.
I am using latest Gobblin code.
I tried removing the hive-exec.jar too. It did not work for me.
2016-03-09 22:35:01,845 ERROR [TaskExecutor-0] gobblin.runtime.Task: Task task_kafka2hdfs_1457562888703_1 failed
java.lang.NoClassDefFoundError: kafka/common/TopicAndPartition
at gobblin.source.extractor.extract.kafka.KafkaWrapper$KafkaOldAPI.createFetchRequest(KafkaWrapper.java:401)
at gobblin.source.extractor.extract.kafka.KafkaWrapper$KafkaOldAPI.fetchNextMessageBuffer(KafkaWrapper.java:333)
at gobblin.source.extractor.extract.kafka.KafkaWrapper.fetchNextMessageBuffer(KafkaWrapper.java:136)
at gobblin.source.extractor.extract.kafka.KafkaExtractor.fetchNextMessageBuffer(KafkaExtractor.java:227)
at gobblin.source.extractor.extract.kafka.KafkaExtractor.readRecordImpl(KafkaExtractor.java:123)
at gobblin.instrumented.extractor.InstrumentedExtractorBase.readRecord(InstrumentedExtractorBase.java:121)
at gobblin.instrumented.extractor.InstrumentedExtractor.readRecord(InstrumentedExtractor.java:34)
at gobblin.instrumented.extractor.InstrumentedExtractorDecorator.readRecordImpl(InstrumentedExtractorDecorator.java:64)
at gobblin.instrumented.extractor.InstrumentedExtractorDecorator.readRecord(InstrumentedExtractorDecorator.java:57)
at gobblin.runtime.Task.run(Task.java:172)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.ClassNotFoundException: kafka.common.TopicAndPartition
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
... 13 more
Github Url : https://github.com/linkedin/gobblin/issues/386#issuecomment-194562251
stakiar wrote on 2016-03-10T00:36:08Z : Adding add `kafka_2.11-0.8.2.1.jar` to the `--jars` option when you running `bin/gobblin-mapreduce.sh` fixes this
Github Url : https://github.com/linkedin/gobblin/issues/386#issuecomment-194589153