Support Questions
Find answers, ask questions, and share your expertise

Re: About spark 1.5 release!

Explorer

Steps

 

 

1) build spark 1.5.1 for your Hadoop version . see http://www.cloudera.com/content/www/en-us/documentation/enterprise/latest/topics/cdh_vd_cdh5_maven_r... to get the right maven version for your CDH release

 

mvn -Pyarn -Phadoop-2.6 -Dhadoop.version=2.6.0-cdh5.4.0 -Phive -Phive-thriftserver -DskipTests clean package

(or download spark-1.5.1-bin-hadoop2.6 - works with CDH 5.4.0)

 

2) Untar

tar -xvf spark-1.5.1-bin-hadoop2.6.tgz

set SPARK_HOME to new spark location

 

3) Copy config from etc/spark/conf to $SPARK_HOME/conf and /etc/spark/conf/yarn-conf to  $SPARK_HOME/conf/yarn-conf

 

4) change

a) update SPARK_HOME in conf/spark-env.sh

b) update spark.yarn.jar 

 

spark.yarn.jar in the CDH 5.4 config has a local: prefix for local files. Spark 1.5 does not like this, just use the full path name for your spark assembly

 

spark.yarn.jar=local:/opt/cloudera/parcels/....

I had changed this to reflect the 1.5.1 version of spark assembly jar

spark.yarn.jar=/opt/spark-1.5.1-bin/...

and this didn't work, I had to drop the "local:" prefix

spark.yarn.jar=/opt/spark-1.5.1-bin/...

 

 

Re: About spark 1.5 release!

Expert Contributor

Thanks for the steps. How does one go about deploying these changes (scripts modifications, env variable changes, etc.) to the entire CDH cluster?

 

Thanks!

Re: About spark 1.5 release!

Explorer

You don't need any changes on the cluster. Each YARN application is self contained, so sends everything it requires on each invocation. So you can install Spark 1.5.1 on the Edge node under your own application user account. In this way you can run multiple Spark version concurrently on the cluster including the one CDH shipped with.

 

Deenar

Re: About spark 1.5 release!

Expert Contributor

Thanks for the info but how would that work in the case of log running job schedule via oozie/Hue?

Re: About spark 1.5 release!

Explorer

b) update spark.yarn.jar 

 


In which file this should be updated?

Re: About spark 1.5 release!

Explorer

Re: About spark 1.5 release!

Explorer

Actually this is optional, if you omit this the spark assembly jar will be copied over to the cluster for every spark job you run on yarn. 

Re: About spark 1.5 release!

New Contributor

Hi,

I upgraded to 1.5.2 and when I start Spark I get an error now:

 

[ec2-user@ip-10-1-1-194 bin]$ ./spark-shell

SLF4J: Class path contains multiple SLF4J bindings.

SLF4J: Found binding in [jar:file:/home/ec2-user/apache_spark/spark-1.5.2-bin-hadoop2.6/lib/spark-assembly-1.5.2-hadoop2.6.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/CDH-5.4.8-1.cdh5.4.8.p0.4/jars/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/CDH-5.4.8-1.cdh5.4.8.p0.4/jars/avro-tools-1.7.6-cdh5.4.8.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.

SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]

log4j:ERROR Could not read configuration file from URL [file:/var/run/cloudera-scm-agent/process/ccdeploy_spark-conf_etcsparkconf.cloudera.CD-SPARK_ON_YARN-LTXiuVwH_-1190306962128363707/yarn-conf/log4j.properties].

 

I expected this:

[ec2-user@ip-10-1-1-79 bin]$ ./spark-shell
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/home/ec2-user/apache_spark/spark-1.5.2-bin-hadoop2.6/lib/spark-assembly-1.5.2-hadoop2.6.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/CDH-5.4.8-1.cdh5.4.8.p0.4/jars/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/CDH-5.4.8-1.cdh5.4.8.p0.4/jars/avro-tools-1.7.6-cdh5.4.8.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
15/11/25 04:59:10 INFO SecurityManager: Changing view acls to: ec2-user
15/11/25 04:59:10 INFO SecurityManager: Changing modify acls to: ec2-user

 

looks like a security issue ...

 

any idea ?

 

greetings

-Jerry

Re: About spark 1.5 release!

New Contributor

just saw that my new conf directory did not contain the log4j.pro​perties file.

I added this manually - and now Spark comes up.

 

Re: About spark 1.5 release!

New Contributor

I thought Spark 1.5 was included in the CDH 5.5.x release.  When looking at the "What's new" section i see:  

  • Apache Spark - 0.90 or later with CDH 4.4.0 or later.

???