Support Questions
Find answers, ask questions, and share your expertise

Exception during Role Assignment in new Spark Cluster

New Contributor

I'm attempting to build a new CDH 5.6.0 with Spark (1.5.0+cdh5.6.0+113) on Ubuntu 14.0.4 LTS.

 

I can get Cloudera Manager installed, it detects the hosts correctly, and we only get an error when we get to the Role Assignment page. The exact error we're getting is:

 

 

2016-04-04 22:04:47,181 INFO 2078186713@scm-web-41:com.cloudera.server.web.common.JFrameException: Exception report generated accessing http://cloudera-mgr.domain.com:7180/cmf/clusters/3/express-add-services/update
Exception executing consequence for rule "Compute hiveserver2_spark_executor_cores" in com.cloudera.cmf.rules: org.drools.RuntimeDroolsException: java.lang.ArithmeticException: / by zero

 

 

Which seems to imply that the number of cores on the selected nodes is zero?

 

If I navigate to /cmf/hardware/hosts on CM, I can verify that all of the nodes are sending a proper heartbeat, and that they all have a non-zero number of cores listed. The exact hardware on the nodes contains two Intel(R) Xeon(R) CPU E5-2660 @ 2.20GHz, which shows up as Cores: 16 (32 w/ Hyperthreading).

 

Has anyone ran into this type of problem, or know how to get past it?

2 REPLIES 2

Re: Exception during Role Assignment in new Spark Cluster

New Contributor

In case anyone else runs into this issue, the problem was isolated down to a handful of Hosts in the new cluster which were selected to run the Role of NodeManager (YARN). Which by default simply mirrors the DataNode assignment.

 

In order to get past this error message in Cloudera Manager, I had to deselect the Hosts causing this error from only the NodeManager Role assignment. It took a bit of trial and error, but eventually we discovered the Host the install wizard didn't like (which had a non-zero number of Cores). Omitting that and clicking Continue allowed it to get past the error message.

 

After the rest of the services were installed and activated, I could go back into the YARN instance and add the Hosts that triggered the error during the installation wizard as NodeManager and they started without issue. It's been a few hours, and I still have Green icons next to all the services, so I'm assuming everything is fine now.

 

Not sure why NodeManager would be triggering a divide by zero for hiveserver2_spark_executor_cores, nor how it determined there were zero cores when all of the Hosts have a non-zero core being reported by CM.

Re: Exception during Role Assignment in new Spark Cluster

There were some subtle bugs in calculating certain Hive on Spark tuning parameters, which have pretty complicated logic. I saw this exception during development of the next release, but not in exactly your scenario. Still, I believe this should be fixed in the next minor release of Cloudera Manager (5.7).

The bug is probably related to having hosts with different hardware or role assignments, which cause CM to generate multiple role config groups by default. Removing that host probably caused CM to generate identical configs for all NodeManagers, leading to them getting merged into a single role config group (the default group) and avoiding the bug.

You can also probably work around this by simply not selecting the Spark service initially, and just adding it later. If using CM < 5.7 (ie all released versions today, 4/6/2016), then you'll need to manually apply the performance tuning if you want to use Hive on Spark (a beta feature in current releases) and see reasonable performance.