Created 01-18-2017 09:02 AM
Hello
yesterday we upgraded our Ambari installation from 2.2.2.0 to 2.4.2.0. Ambari is managing a HDP 2.3.6 cluster. After the upgrade (following all these instructions
http://docs.hortonworks.com/HDPDocuments/Ambari-2.4.2.0/bk_ambari-upgrade/content/upgrade_ambari.htm... Storm Nimbus crashes on start with this exception:
2017-01-17 19:39:29.871 b.s.zookeeper [INFO] Accepting leadership, all active topology found localy. 2017-01-17 19:39:29.928 b.s.d.nimbus [INFO] Starting Nimbus server... 2017-01-17 19:39:30.860 b.s.d.nimbus [ERROR] Error when processing event java.lang.NullPointerException at clojure.lang.Numbers.ops(Numbers.java:961) ~[clojure-1.6.0.jar:?] at clojure.lang.Numbers.isZero(Numbers.java:90) ~[clojure-1.6.0.jar:?] at backtype.storm.util$partition_fixed.invoke(util.clj:900) ~[storm-core-0.10.0.2.3.6.0-3796.jar:0.10.0.2.3.6.0-3796] at clojure.lang.AFn.applyToHelper(AFn.java:156) ~[clojure-1.6.0.jar:?] at clojure.lang.AFn.applyTo(AFn.java:144) ~[clojure-1.6.0.jar:?] at clojure.core$apply.invoke(core.clj:624) ~[clojure-1.6.0.jar:?] at clojure.lang.AFn.applyToHelper(AFn.java:156) ~[clojure-1.6.0.jar:?] at clojure.lang.RestFn.applyTo(RestFn.java:132) ~[clojure-1.6.0.jar:?] at clojure.core$apply.invoke(core.clj:626) ~[clojure-1.6.0.jar:?] at clojure.core$partial$fn__4228.doInvoke(core.clj:2468) ~[clojure-1.6.0.jar:?] at clojure.lang.RestFn.invoke(RestFn.java:408) ~[clojure-1.6.0.jar:?] at backtype.storm.util$map_val$iter__1807__1811$fn__1812.invoke(util.clj:305) ~[storm-core-0.10.0.2.3.6.0-3796.jar:0.10.0.2.3.6.0-3796] at clojure.lang.LazySeq.sval(LazySeq.java:40) ~[clojure-1.6.0.jar:?] at clojure.lang.LazySeq.seq(LazySeq.java:49) ~[clojure-1.6.0.jar:?] at clojure.lang.Cons.next(Cons.java:39) ~[clojure-1.6.0.jar:?] at clojure.lang.RT.next(RT.java:598) ~[clojure-1.6.0.jar:?] at clojure.core$next.invoke(core.clj:64) ~[clojure-1.6.0.jar:?] at clojure.core.protocols$fn__6086.invoke(protocols.clj:146) ~[clojure-1.6.0.jar:?] at clojure.core.protocols$fn__6057$G__6052__6066.invoke(protocols.clj:19) ~[clojure-1.6.0.jar:?] at clojure.core.protocols$seq_reduce.invoke(protocols.clj:31) ~[clojure-1.6.0.jar:?] at clojure.core.protocols$fn__6078.invoke(protocols.clj:54) ~[clojure-1.6.0.jar:?] at clojure.core.protocols$fn__6031$G__6026__6044.invoke(protocols.clj:13) ~[clojure-1.6.0.jar:?] at clojure.core$reduce.invoke(core.clj:6289) ~[clojure-1.6.0.jar:?] at clojure.core$into.invoke(core.clj:6341) ~[clojure-1.6.0.jar:?] at backtype.storm.util$map_val.invoke(util.clj:304) ~[storm-core-0.10.0.2.3.6.0-3796.jar:0.10.0.2.3.6.0-3796] at backtype.storm.daemon.nimbus$compute_executors.invoke(nimbus.clj:491) ~[storm-core-0.10.0.2.3.6.0-3796.jar:0.10.0.2.3.6.0-3796] at backtype.storm.daemon.nimbus$compute_executor__GT_component.invoke(nimbus.clj:502) ~[storm-core-0.10.0.2.3.6.0-3796.jar:0.10.0.2.3.6.0-3796] at backtype.storm.daemon.nimbus$read_topology_details.invoke(nimbus.clj:394) ~[storm-core-0.10.0.2.3.6.0-3796.jar:0.10.0.2.3.6.0-3796] at backtype.storm.daemon.nimbus$mk_assignments$iter__7809__7813$fn__7814.invoke(nimbus.clj:722) ~[storm-core-0.10.0.2.3.6.0-3796.jar:0.10.0.2.3.6.0-3796] at clojure.lang.LazySeq.sval(LazySeq.java:40) ~[clojure-1.6.0.jar:?] at clojure.lang.LazySeq.seq(LazySeq.java:49) ~[clojure-1.6.0.jar:?] at clojure.lang.RT.seq(RT.java:484) ~[clojure-1.6.0.jar:?] at clojure.core$seq.invoke(core.clj:133) ~[clojure-1.6.0.jar:?] at clojure.core.protocols$seq_reduce.invoke(protocols.clj:30) ~[clojure-1.6.0.jar:?] at clojure.core.protocols$fn__6078.invoke(protocols.clj:54) ~[clojure-1.6.0.jar:?] at clojure.core.protocols$fn__6031$G__6026__6044.invoke(protocols.clj:13) ~[clojure-1.6.0.jar:?] at clojure.core$reduce.invoke(core.clj:6289) ~[clojure-1.6.0.jar:?] at clojure.core$into.invoke(core.clj:6341) ~[clojure-1.6.0.jar:?] at backtype.storm.daemon.nimbus$mk_assignments.doInvoke(nimbus.clj:721) ~[storm-core-0.10.0.2.3.6.0-3796.jar:0.10.0.2.3.6.0-3796] at clojure.lang.RestFn.invoke(RestFn.java:410) ~[clojure-1.6.0.jar:?] at backtype.storm.daemon.nimbus$fn__8060$exec_fn__3866__auto____8061$fn__8068$fn__8069.invoke(nimbus.clj:1112) ~[storm-core-0.10.0.2.3.6.0-3796.jar:0.10.0.2.3.6.0-3796] at backtype.storm.daemon.nimbus$fn__8060$exec_fn__3866__auto____8061$fn__8068.invoke(nimbus.clj:1111) ~[storm-core-0.10.0.2.3.6.0-3796.jar:0.10.0.2.3.6.0-3796] at backtype.storm.timer$schedule_recurring$this__2489.invoke(timer.clj:102) ~[storm-core-0.10.0.2.3.6.0-3796.jar:0.10.0.2.3.6.0-3796] at backtype.storm.timer$mk_timer$fn__2472$fn__2473.invoke(timer.clj:50) [storm-core-0.10.0.2.3.6.0-3796.jar:0.10.0.2.3.6.0-3796] at backtype.storm.timer$mk_timer$fn__2472.invoke(timer.clj:42) [storm-core-0.10.0.2.3.6.0-3796.jar:0.10.0.2.3.6.0-3796] at clojure.lang.AFn.run(AFn.java:22) [clojure-1.6.0.jar:?] at java.lang.Thread.run(Thread.java:745) [?:1.8.0_40] 2017-01-17 19:39:30.873 b.s.util [ERROR] Halting process: ("Error when processing an event") java.lang.RuntimeException: ("Error when processing an event") at backtype.storm.util$exit_process_BANG_.doInvoke(util.clj:336) [storm-core-0.10.0.2.3.6.0-3796.jar:0.10.0.2.3.6.0-3796] at clojure.lang.RestFn.invoke(RestFn.java:423) [clojure-1.6.0.jar:?] at backtype.storm.daemon.nimbus$nimbus_data$fn__7411.invoke(nimbus.clj:118) [storm-core-0.10.0.2.3.6.0-3796.jar:0.10.0.2.3.6.0-3796] at backtype.storm.timer$mk_timer$fn__2472$fn__2473.invoke(timer.clj:71) [storm-core-0.10.0.2.3.6.0-3796.jar:0.10.0.2.3.6.0-3796] at backtype.storm.timer$mk_timer$fn__2472.invoke(timer.clj:42) [storm-core-0.10.0.2.3.6.0-3796.jar:0.10.0.2.3.6.0-3796] at clojure.lang.AFn.run(AFn.java:22) [clojure-1.6.0.jar:?] at java.lang.Thread.run(Thread.java:745) [?:1.8.0_40] 2017-01-17 19:39:30.876 b.s.d.nimbus [INFO] Shutting down master
Why is this happening? Storm version is 0.10.0.2.3
Any hint on how to debug more thoroughly this issue? Right now we cannot deploy new topologies
Created 01-18-2017 09:25 AM
@Davide Ferrari Unfortunately this is a bug.
To get out of this situation you would need to stop the running topologies and clear up the nodes maintained by storm in zookeeper and restart storm.
Below paths in zookeeper /storm/storms /storm/assignments
Please note I would be sceptical of doing this in production without understanding the impact.
Created 01-18-2017 09:25 AM
@Davide Ferrari Unfortunately this is a bug.
To get out of this situation you would need to stop the running topologies and clear up the nodes maintained by storm in zookeeper and restart storm.
Below paths in zookeeper /storm/storms /storm/assignments
Please note I would be sceptical of doing this in production without understanding the impact.
Created 01-19-2017 09:19 AM
Hello @Santhosh B Gowda
we have fixed it by deleting the whole /storm path in Zookeeper + /var/hadoop/storm in the Nimbus hosts, and then deployed again the topologies. The only drawback is that we had to stop all the topologies for some minutes, with a minor downtime. Thanks for the help