Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Set environment variable in CDH for Spark executors

avatar
New Contributor

Hi,

 

I have been trying to set environment variable in Spark.  However there seems to be problems.

 

I tried to use HDFS/YARN from CDH 5.12, and a standalone Spark (v2.2.0) and run together with Crail (https://github.com/zrlio/crail).  However, there is error in the YARN logs saying that Crail's library path is not included in java.library.path. 

 

...

17/11/27 10:57:50 INFO ibm.crail: crail.storage.rdma.type passive

17/11/27 10:57:50 INFO ibm.disni: creating RdmaProvider of type 'nat'
Exception in thread "dag-scheduler-event-loop" java.lang.UnsatisfiedLinkError: no disni in java.library.path

at java.lang.ClassLoader.loadLibrary(ClassLoader.java:1867)
at java.lang.Runtime.loadLibrary0(Runtime.java:870)
at java.lang.System.loadLibrary(System.java:1122)

....

 

 

I found in a post from Crail's user group that it can be fixed by setting the following variable:

LD_LIBRARY_PATH=/opt/crail/crail-1.0-bin/lib:$LD_LIBRARY_PATH

or: 

spark.executor.extraJavaOptions -Djava.library.path=/opt/crail/crail-1.0-bin/lib

 

Here is the post:

https://groups.google.com/forum/#!topic/zrlio-users/_P5NeH3iHxE

 

Can you please guide where I should set the environment variable inside CDH?

 

I tried to set the environment variable inside ~/.bashrc and spark-env.sh.  However, it didn't work, because it seems CDH will reset all enviroment variables when starting services. 

 

I also tried setting the environment variable in all the places I can find inside CDH, including the configuration of Environments in Cloudera Management Service, YARN, and HDFS.  But the problem is still not solved.

 

Thanks,

Kevin

 

 

1 ACCEPTED SOLUTION

avatar
Contributor
This is the order of precedence for configurations that Spark will use: - Properties set on SparkConf or SparkContext in code - Arguments passed to spark-submit, spark-shell, or pyspark at run time - Properties set in /etc/spark/conf/spark-defaults.conf, a specified properties file or in Cloudera Manager safety valve - Environment variables exported or set in scripts * For properties that apply to all jobs, use spark-defaults.conf, for properties that are constant and specific to a single or a few applications use SparkConf or --properties-file, for properties that change between runs use command line arguments.

View solution in original post

2 REPLIES 2

avatar
New Contributor

To simplify the question:  how can I set multiple Environment Variables under "yarn.nodemanager.admin-en"?

 

avatar
Contributor
This is the order of precedence for configurations that Spark will use: - Properties set on SparkConf or SparkContext in code - Arguments passed to spark-submit, spark-shell, or pyspark at run time - Properties set in /etc/spark/conf/spark-defaults.conf, a specified properties file or in Cloudera Manager safety valve - Environment variables exported or set in scripts * For properties that apply to all jobs, use spark-defaults.conf, for properties that are constant and specific to a single or a few applications use SparkConf or --properties-file, for properties that change between runs use command line arguments.