Support Questions

Find answers, ask questions, and share your expertise

Updated Ambari to 2.4.2, Nimbus crash on start. How to fix?

avatar
Contributor

Hello

yesterday we upgraded our Ambari installation from 2.2.2.0 to 2.4.2.0. Ambari is managing a HDP 2.3.6 cluster. After the upgrade (following all these instructions

http://docs.hortonworks.com/HDPDocuments/Ambari-2.4.2.0/bk_ambari-upgrade/content/upgrade_ambari.htm... Storm Nimbus crashes on start with this exception:

2017-01-17 19:39:29.871 b.s.zookeeper [INFO] Accepting leadership, all active topology found localy.
2017-01-17 19:39:29.928 b.s.d.nimbus [INFO] Starting Nimbus server...
2017-01-17 19:39:30.860 b.s.d.nimbus [ERROR] Error when processing event
java.lang.NullPointerException
    at clojure.lang.Numbers.ops(Numbers.java:961) ~[clojure-1.6.0.jar:?]
    at clojure.lang.Numbers.isZero(Numbers.java:90) ~[clojure-1.6.0.jar:?]
    at backtype.storm.util$partition_fixed.invoke(util.clj:900) ~[storm-core-0.10.0.2.3.6.0-3796.jar:0.10.0.2.3.6.0-3796]
    at clojure.lang.AFn.applyToHelper(AFn.java:156) ~[clojure-1.6.0.jar:?]
    at clojure.lang.AFn.applyTo(AFn.java:144) ~[clojure-1.6.0.jar:?]
    at clojure.core$apply.invoke(core.clj:624) ~[clojure-1.6.0.jar:?]
    at clojure.lang.AFn.applyToHelper(AFn.java:156) ~[clojure-1.6.0.jar:?]
    at clojure.lang.RestFn.applyTo(RestFn.java:132) ~[clojure-1.6.0.jar:?]
    at clojure.core$apply.invoke(core.clj:626) ~[clojure-1.6.0.jar:?]
    at clojure.core$partial$fn__4228.doInvoke(core.clj:2468) ~[clojure-1.6.0.jar:?]
    at clojure.lang.RestFn.invoke(RestFn.java:408) ~[clojure-1.6.0.jar:?]
    at backtype.storm.util$map_val$iter__1807__1811$fn__1812.invoke(util.clj:305) ~[storm-core-0.10.0.2.3.6.0-3796.jar:0.10.0.2.3.6.0-3796]
    at clojure.lang.LazySeq.sval(LazySeq.java:40) ~[clojure-1.6.0.jar:?]
    at clojure.lang.LazySeq.seq(LazySeq.java:49) ~[clojure-1.6.0.jar:?]
    at clojure.lang.Cons.next(Cons.java:39) ~[clojure-1.6.0.jar:?]
    at clojure.lang.RT.next(RT.java:598) ~[clojure-1.6.0.jar:?]
    at clojure.core$next.invoke(core.clj:64) ~[clojure-1.6.0.jar:?]
    at clojure.core.protocols$fn__6086.invoke(protocols.clj:146) ~[clojure-1.6.0.jar:?]
    at clojure.core.protocols$fn__6057$G__6052__6066.invoke(protocols.clj:19) ~[clojure-1.6.0.jar:?]
    at clojure.core.protocols$seq_reduce.invoke(protocols.clj:31) ~[clojure-1.6.0.jar:?]
    at clojure.core.protocols$fn__6078.invoke(protocols.clj:54) ~[clojure-1.6.0.jar:?]
    at clojure.core.protocols$fn__6031$G__6026__6044.invoke(protocols.clj:13) ~[clojure-1.6.0.jar:?]
    at clojure.core$reduce.invoke(core.clj:6289) ~[clojure-1.6.0.jar:?]
    at clojure.core$into.invoke(core.clj:6341) ~[clojure-1.6.0.jar:?]
    at backtype.storm.util$map_val.invoke(util.clj:304) ~[storm-core-0.10.0.2.3.6.0-3796.jar:0.10.0.2.3.6.0-3796]
    at backtype.storm.daemon.nimbus$compute_executors.invoke(nimbus.clj:491) ~[storm-core-0.10.0.2.3.6.0-3796.jar:0.10.0.2.3.6.0-3796]
    at backtype.storm.daemon.nimbus$compute_executor__GT_component.invoke(nimbus.clj:502) ~[storm-core-0.10.0.2.3.6.0-3796.jar:0.10.0.2.3.6.0-3796]
    at backtype.storm.daemon.nimbus$read_topology_details.invoke(nimbus.clj:394) ~[storm-core-0.10.0.2.3.6.0-3796.jar:0.10.0.2.3.6.0-3796]
    at backtype.storm.daemon.nimbus$mk_assignments$iter__7809__7813$fn__7814.invoke(nimbus.clj:722) ~[storm-core-0.10.0.2.3.6.0-3796.jar:0.10.0.2.3.6.0-3796]
    at clojure.lang.LazySeq.sval(LazySeq.java:40) ~[clojure-1.6.0.jar:?]
    at clojure.lang.LazySeq.seq(LazySeq.java:49) ~[clojure-1.6.0.jar:?]
    at clojure.lang.RT.seq(RT.java:484) ~[clojure-1.6.0.jar:?]
    at clojure.core$seq.invoke(core.clj:133) ~[clojure-1.6.0.jar:?]
    at clojure.core.protocols$seq_reduce.invoke(protocols.clj:30) ~[clojure-1.6.0.jar:?]
    at clojure.core.protocols$fn__6078.invoke(protocols.clj:54) ~[clojure-1.6.0.jar:?]
    at clojure.core.protocols$fn__6031$G__6026__6044.invoke(protocols.clj:13) ~[clojure-1.6.0.jar:?]
    at clojure.core$reduce.invoke(core.clj:6289) ~[clojure-1.6.0.jar:?]
    at clojure.core$into.invoke(core.clj:6341) ~[clojure-1.6.0.jar:?]
    at backtype.storm.daemon.nimbus$mk_assignments.doInvoke(nimbus.clj:721) ~[storm-core-0.10.0.2.3.6.0-3796.jar:0.10.0.2.3.6.0-3796]
    at clojure.lang.RestFn.invoke(RestFn.java:410) ~[clojure-1.6.0.jar:?]
    at backtype.storm.daemon.nimbus$fn__8060$exec_fn__3866__auto____8061$fn__8068$fn__8069.invoke(nimbus.clj:1112) ~[storm-core-0.10.0.2.3.6.0-3796.jar:0.10.0.2.3.6.0-3796]
    at backtype.storm.daemon.nimbus$fn__8060$exec_fn__3866__auto____8061$fn__8068.invoke(nimbus.clj:1111) ~[storm-core-0.10.0.2.3.6.0-3796.jar:0.10.0.2.3.6.0-3796]
    at backtype.storm.timer$schedule_recurring$this__2489.invoke(timer.clj:102) ~[storm-core-0.10.0.2.3.6.0-3796.jar:0.10.0.2.3.6.0-3796]
    at backtype.storm.timer$mk_timer$fn__2472$fn__2473.invoke(timer.clj:50) [storm-core-0.10.0.2.3.6.0-3796.jar:0.10.0.2.3.6.0-3796]
    at backtype.storm.timer$mk_timer$fn__2472.invoke(timer.clj:42) [storm-core-0.10.0.2.3.6.0-3796.jar:0.10.0.2.3.6.0-3796]
    at clojure.lang.AFn.run(AFn.java:22) [clojure-1.6.0.jar:?]
    at java.lang.Thread.run(Thread.java:745) [?:1.8.0_40]
2017-01-17 19:39:30.873 b.s.util [ERROR] Halting process: ("Error when processing an event")
java.lang.RuntimeException: ("Error when processing an event")
    at backtype.storm.util$exit_process_BANG_.doInvoke(util.clj:336) [storm-core-0.10.0.2.3.6.0-3796.jar:0.10.0.2.3.6.0-3796]
    at clojure.lang.RestFn.invoke(RestFn.java:423) [clojure-1.6.0.jar:?]
    at backtype.storm.daemon.nimbus$nimbus_data$fn__7411.invoke(nimbus.clj:118) [storm-core-0.10.0.2.3.6.0-3796.jar:0.10.0.2.3.6.0-3796]
    at backtype.storm.timer$mk_timer$fn__2472$fn__2473.invoke(timer.clj:71) [storm-core-0.10.0.2.3.6.0-3796.jar:0.10.0.2.3.6.0-3796]
    at backtype.storm.timer$mk_timer$fn__2472.invoke(timer.clj:42) [storm-core-0.10.0.2.3.6.0-3796.jar:0.10.0.2.3.6.0-3796]
    at clojure.lang.AFn.run(AFn.java:22) [clojure-1.6.0.jar:?]
    at java.lang.Thread.run(Thread.java:745) [?:1.8.0_40]
2017-01-17 19:39:30.876 b.s.d.nimbus [INFO] Shutting down master


Why is this happening? Storm version is 0.10.0.2.3

Any hint on how to debug more thoroughly this issue? Right now we cannot deploy new topologies

1 ACCEPTED SOLUTION

avatar

@Davide Ferrari Unfortunately this is a bug.

To get out of this situation you would need to stop the running topologies and clear up the nodes maintained by storm in zookeeper and restart storm.

Below paths in zookeeper
/storm/storms
/storm/assignments

Please note I would be sceptical of doing this in production without understanding the impact.

View solution in original post

2 REPLIES 2

avatar

@Davide Ferrari Unfortunately this is a bug.

To get out of this situation you would need to stop the running topologies and clear up the nodes maintained by storm in zookeeper and restart storm.

Below paths in zookeeper
/storm/storms
/storm/assignments

Please note I would be sceptical of doing this in production without understanding the impact.

avatar
Contributor

Hello @Santhosh B Gowda

we have fixed it by deleting the whole /storm path in Zookeeper + /var/hadoop/storm in the Nimbus hosts, and then deployed again the topologies. The only drawback is that we had to stop all the topologies for some minutes, with a minor downtime. Thanks for the help