Support Questions

Find answers, ask questions, and share your expertise

HDP-2.6.1 Spark Installation fails only in one node

Explorer

Hi everyone. I've been several days trying to install HDP 2.6.1.0 on a RHEL 7.2 cluster. My current problem is Apache Spark installs in three nodes and fails on one node. I've checked the python scripts and they are the same. ¿Do I need something different in the node that's failing?.

The log for successful installation is:

2017-07-26 14:24:38,858 - /etc/hadoop/conf is already linked to /etc/hadoop/2.6.1.0-129/0
2017-07-26 14:24:38,858 - /etc/mahout/conf is already linked to /etc/mahout/2.6.1.0-129/0
2017-07-26 14:24:38,858 - Skipping /etc/storm/conf as it does not exist.
2017-07-26 14:24:38,858 - /etc/atlas/conf is already linked to /etc/atlas/2.6.1.0-129/0
2017-07-26 14:24:38,858 - Skipping /etc/ranger/admin/conf as it does not exist.
2017-07-26 14:24:38,859 - /etc/flume/conf is already linked to /etc/flume/2.6.1.0-129/0
2017-07-26 14:24:38,859 - /etc/sqoop/conf is already linked to /etc/sqoop/2.6.1.0-129/0
2017-07-26 14:24:38,859 - /etc/accumulo/conf is already linked to /etc/accumulo/2.6.1.0-129/0
2017-07-26 14:24:38,860 - Skipping /etc/phoenix/conf as it does not exist.
2017-07-26 14:24:38,860 - /etc/storm-slider-client/conf is already linked to /etc/storm-slider-client/2.6.1.0-129/0
2017-07-26 14:24:38,860 - /etc/slider/conf is already linked to /etc/slider/2.6.1.0-129/0
2017-07-26 14:24:38,860 - Skipping /etc/zeppelin/conf as it does not exist.
2017-07-26 14:24:38,861 - /etc/hive-webhcat/conf is already linked to /etc/hive-webhcat/2.6.1.0-129/0
2017-07-26 14:24:38,861 - /etc/hive-hcatalog/conf is already linked to /etc/hive-hcatalog/2.6.1.0-129/0
2017-07-26 14:24:38,861 - /etc/falcon/conf is already linked to /etc/falcon/2.6.1.0-129/0
2017-07-26 14:24:38,861 - Skipping /etc/knox/conf as it does not exist.
2017-07-26 14:24:38,862 - /etc/pig/conf is already linked to /etc/pig/2.6.1.0-129/0
2017-07-26 14:24:38,862 - /etc/spark2/conf is already linked to /etc/spark2/2.6.1.0-129/0
2017-07-26 14:24:38,862 - /etc/hive/conf is already linked to /etc/hive/2.6.1.0-129/0
Command completed successfully!

And the log for failing installation is:

stderr: /var/lib/ambari-agent/data/errors-599.txt
File "/usr/lib/python2.6/site-packages/resource_management/core/base.py", line 155, in __init__
    self.env.run()
  File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 160, in run
    self.run_action(resource, action)
  File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 124, in run_action
    provider_action()
  File "/usr/lib/python2.6/site-packages/resource_management/core/providers/system.py", line 120, in action_create
    raise Fail("Applying %s failed, parent directory %s doesn't exist" % (self.resource, dirname))
resource_management.core.exceptions.Fail: Applying File['/usr/hdp/current/spark-client/conf/spark-defaults.conf'] failed, parent directory /usr/hdp/current/spark-client/conf doesn't exist
stdout: /var/lib/ambari-agent/data/output-599.txt
......
2017-07-26 14:24:23,893 - Using hadoop conf dir: /usr/hdp/current/hadoop-client/conf
2017-07-26 14:24:23,907 - call['ambari-python-wrap /usr/bin/hdp-select status spark-client'] {'timeout': 20}
2017-07-26 14:24:23,939 - call returned (0, 'spark-client - 2.6.1.0-129')
2017-07-26 14:24:23,948 - Directory['/var/run/spark'] {'owner': 'spark', 'create_parents': True, 'group': 'hadoop', 'mode': 0775}
2017-07-26 14:24:23,949 - Directory['/var/log/spark'] {'owner': 'spark', 'group': 'hadoop', 'create_parents': True, 'mode': 0775}
2017-07-26 14:24:23,950 - PropertiesFile['/usr/hdp/current/spark-client/conf/spark-defaults.conf'] {'owner': 'spark', 'key_value_delimiter': ' ', 'group': 'spark', 'mode': 0644, 'properties': ...}
2017-07-26 14:24:23,958 - Generating properties file: /usr/hdp/current/spark-client/conf/spark-defaults.conf
2017-07-26 14:24:23,959 - File['/usr/hdp/current/spark-client/conf/spark-defaults.conf'] {'owner': 'spark', 'content': InlineTemplate(...), 'group': 'spark', 'mode': 0644}
Command failed after 1 tries

The step: " Generating properties file: /usr/hdp/current/spark-client/conf/spark-defaults.conf", fails only in one node. As you can see, the other installation shows successful file creation

Thanks if anyone can give me some advice, we are really interested in using this suite for some use cases in our company.

Regards,

Miguel

3 REPLIES 3

@Malena Maguina

it seems due to some reason spark is not able to create conf dir.

I'd suggest you to create "/usr/hdp/current/spark-client/conf" directory manually and copy the required jars form other nodes,

Typically spark conf directory should be like:

[root@ conf]# ls -lrt

total 64

-rwxr-xr-x 1 root root 3418 Dec 16 2015 spark-env.sh.template

-rw-r--r-- 1 root root 507 Dec 16 2015 spark-defaults.conf.template

-rw-r--r-- 1 root root 80 Dec 16 2015 slaves.template

-rw-r--r-- 1 root root 5886 Dec 16 2015 metrics.properties.template

-rw-r--r-- 1 root root 949 Dec 16 2015 log4j.properties.template

-rw-r--r-- 1 root root 303 Dec 16 2015 fairscheduler.xml.template

-rw-r--r-- 1 root root 202 Dec 16 2015 docker.properties.template

-rw-r--r-- 1 spark spark 620 Apr 21 18:09 log4j.properties

-rw-r--r-- 1 spark spark 4956 Apr 21 18:09 metrics.properties

-rw-r--r-- 1 spark spark 736 Apr 21 18:09 hive-site.xml

-rwxr-xr-x 1 spark spark 253 Jun 8 22:25 spark-thrift-fairscheduler.xml

-rw-r--r-- 1 spark spark 1911 Jul 26 11:17 spark-env.sh

-rw-r--r-- 1 spark spark 948 Jul 26 12:53 spark-defaults.conf

-rw-r--r-- 1 hive hadoop 973 Jul 26 12:53 spark-thrift-sparkconf.conf

[root@conf]#

Run below command as well to check if spark installed or not:

rpm -qa | grep spark

Super Mentor

@Miguel Marquez

Sometime it can happen if older version of spark binaries are already installed (or due to incomplete installation) on the problematic host.

- On that host, Please check:

# rpm -qa | grep spark
# yum remove spark_xxxx
# yum clean all
# yum install spark_xxxx

Please replace xxxx with your desired version. Reinstalling the package should fix missing symlink issues.

.

Explorer

Miguel Marquez

Please try installing with Ambari rest API, like below

curl --user <ambari-admin-user>:<ambari-admin-password> -i -H 'X-Requested-By: ambari' -X POST http://AMBARI_SERVER_HOST:8080/api/v1/clusters/CLUSTER_NAME/hosts/NEW_HOST_ADDED/host_components/SPA...