Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Duplicate
-
0.9.1-incubating
-
None
-
CentOS 1.6
storm-core-0.9.1-incubating
Apache Kafka 0.8.1
Zookeeper 3.4.6
Description
I have a one topology running on my production cluster. This topology has run for some weeks without fails, but few days ago my supervisor died with this error:
tail /var/log/storm/supervisor.log
2014-07-27 23:15:26 b.s.event [ERROR] Error when processing event
java.lang.RuntimeException: java.io.EOFException
at backtype.storm.utils.Utils.deserialize(Utils.java:86) ~[storm-core-0.9.1-incubating-mmx2.jar:0.9.1-incubating-mmx2]
at backtype.storm.utils.LocalState.snapshot(LocalState.java:45) ~[storm-core-0.9.1-incubating-mmx2.jar:0.9.1-incubating-mmx2]
at backtype.storm.utils.LocalState.get(LocalState.java:56) ~[storm-core-0.9.1-incubating-mmx2.jar:0.9.1-incubating-mmx2]
at backtype.storm.daemon.supervisor$read_worker_heartbeat.invoke(supervisor.clj:77) ~[storm-core-0.9.1-incubating-mmx2.jar:0.9.1-incubating-mmx2]
at backtype.storm.daemon.supervisor$read_worker_heartbeats$iter_48424846$fn_4847.invoke(supervisor.clj:90) ~[na:na]
at clojure.lang.LazySeq.sval(LazySeq.java:42) ~[clojure-1.4.0.jar:na]
at clojure.lang.LazySeq.seq(LazySeq.java:60) ~[clojure-1.4.0.jar:na]
at clojure.lang.RT.seq(RT.java:473) ~[clojure-1.4.0.jar:na]
at clojure.core$seq.invoke(core.clj:133) ~[clojure-1.4.0.jar:na]
at clojure.core$dorun.invoke(core.clj:2725) ~[clojure-1.4.0.jar:na]
at clojure.core$doall.invoke(core.clj:2741) ~[clojure-1.4.0.jar:na]
at backtype.storm.daemon.supervisor$read_worker_heartbeats.invoke(supervisor.clj:89) ~[storm-core-0.9.1-incubating-mmx2.jar:0.9.1-incubating-mmx2]
at backtype.storm.daemon.supervisor$read_allocated_workers.invoke(supervisor.clj:106) ~[storm-core-0.9.1-incubating-mmx2.jar:0.9.1-incubating-mmx2]
at backtype.storm.daemon.supervisor$sync_processes.invoke(supervisor.clj:209) ~[storm-core-0.9.1-incubating-mmx2.jar:0.9.1-incubating-mmx2]
at clojure.lang.AFn.applyToHelper(AFn.java:161) [clojure-1.4.0.jar:na]
at clojure.lang.AFn.applyTo(AFn.java:151) [clojure-1.4.0.jar:na]
at clojure.core$apply.invoke(core.clj:603) ~[clojure-1.4.0.jar:na]
at clojure.core$partial$fn__4070.doInvoke(core.clj:2343) ~[clojure-1.4.0.jar:na]
at clojure.lang.RestFn.invoke(RestFn.java:397) ~[clojure-1.4.0.jar:na]
at backtype.storm.event$event_manager$fn__2593.invoke(event.clj:39) ~[na:na]
at clojure.lang.AFn.run(AFn.java:24) [clojure-1.4.0.jar:na]
at java.lang.Thread.run(Unknown Source) [na:1.7.0_03]
Caused by: java.io.EOFException: null
at java.io.ObjectInputStream$PeekInputStream.readFully(Unknown Source) ~[na:1.7.0_03]
at java.io.ObjectInputStream$BlockDataInputStream.readShort(Unknown Source) ~[na:1.7.0_03]
at java.io.ObjectInputStream.readStreamHeader(Unknown Source) ~[na:1.7.0_03]
at java.io.ObjectInputStream.<init>(Unknown Source) ~[na:1.7.0_03]
at backtype.storm.utils.Utils.deserialize(Utils.java:81) ~[storm-core-0.9.1-incubating-mmx2.jar:0.9.1-incubating-mmx2]
... 21 common frames omitted
2014-07-27 23:15:26 b.s.util [INFO] Halting process: ("Error when processing an event")
He tried wake up but I died with the same error all time.
I have fixed the problem when I delete my temporally storm directory "/tmp/storm" But the next day, I found the same problem again. I deleted the directory again and now the topology runs fine but I think the error "[ERROR] Error when processing even" isn't normal and I have decided report it.