Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Spark on CDM 5.7.1 and Ubuntu 14.04

Spark on CDM 5.7.1 and Ubuntu 14.04

New Contributor

Hi,

 

Im having a similar problem with Spark and Hue on CDM as this post: 

 

http://community.cloudera.com/t5/Advanced-Analytics-Apache-Spark/required-help-to-install-spark-on-C...

 

I have a running CDM 5.7.1 cluster on Ubuntu 14.04, with all services working fine (apart from spark and impala - see parcel errors below).

 

It apears that the spark hostory servers and gateways are installed, but I cant activate Spark in Standalone or Spark on Yarn. In the parcels section I am getting errors across a number of services:

 

  • Error for parcel SPARK-0.9.0-1.cdh4.6.0.p0.98-trusty : Parcel not available for OS Distribution UBUNTU_TRUSTY.
  • Error for parcel SOLR-1.3.0-1.cdh4.5.0.p0.9-trusty : Parcel not available for OS Distribution UBUNTU_TRUSTY.
  • Error for parcel IMPALA-2.1.0-1.impala2.0.0.p0.1995-trusty : Parcel not available for OS Distribution UBUNTU_TRUSTY.
  • Error for parcel ACCUMULO-1.4.4-1.cdh4.5.0.p0.65-trusty : Parcel not available for OS Distribution UBUNTU_TRUSTY.

Having checked, this appears to mean that there isnt a Ubuntu Trusty version for the above parcels.

 

Can you confirm if this is the case.

 

If so, can I install the components via apt-get:

 

sudo apt-get install spark-core spark-master spark-worker spark-history-server spark-python

as described in this link for CDM 5.4.x:

 

http://www.cloudera.com/documentation/enterprise/5-4-x/topics/cdh_ig_spark_install.html

 

Any guidance on this would be appreciated.

 

After installing spark, I'd like to activate the Hue Spark Notebook, but can see that in Hue the app_blacklist is set to:

 

app_blacklist

['spark', 'zookeeper', 'security']

 

I have removed spark and zookeeper from the app_blacklist leaving only 'security' and have restarted the Hue service, and refreshed Hue web UI, The Hue.ini dump now onlye shows 'security' but still dont have any spark notebook available. This may be due to the dependency on Spark Parcels being installed.

 

If I have to re-install the cluster onto Linux RedHat to activate the Spark Parcels that could be a possibility, but I'd prefer to get everything working on ubuntu 14.04 first if possible.

 

Any guidance on which route to take would be appreciated.

 

Regards

 

natdacruz