Support Questions

Find answers, ask questions, and share your expertise

Set environment variable in CDH for Spark executors

avatar
New Contributor

Hi,

 

I have been trying to set environment variable in Spark.  However there seems to be problems.

 

I tried to use HDFS/YARN from CDH 5.12, and a standalone Spark (v2.2.0) and run together with Crail (https://github.com/zrlio/crail).  However, there is error in the YARN logs saying that Crail's library path is not included in java.library.path. 

 

...

17/11/27 10:57:50 INFO ibm.crail: crail.storage.rdma.type passive

17/11/27 10:57:50 INFO ibm.disni: creating RdmaProvider of type 'nat'
Exception in thread "dag-scheduler-event-loop" java.lang.UnsatisfiedLinkError: no disni in java.library.path

at java.lang.ClassLoader.loadLibrary(ClassLoader.java:1867)
at java.lang.Runtime.loadLibrary0(Runtime.java:870)
at java.lang.System.loadLibrary(System.java:1122)

....

 

 

I found in a post from Crail's user group that it can be fixed by setting the following variable:

LD_LIBRARY_PATH=/opt/crail/crail-1.0-bin/lib:$LD_LIBRARY_PATH

or: 

spark.executor.extraJavaOptions -Djava.library.path=/opt/crail/crail-1.0-bin/lib

 

Here is the post:

https://groups.google.com/forum/#!topic/zrlio-users/_P5NeH3iHxE

 

Can you please guide where I should set the environment variable inside CDH?

 

I tried to set the environment variable inside ~/.bashrc and spark-env.sh.  However, it didn't work, because it seems CDH will reset all enviroment variables when starting services. 

 

I also tried setting the environment variable in all the places I can find inside CDH, including the configuration of Environments in Cloudera Management Service, YARN, and HDFS.  But the problem is still not solved.

 

Thanks,

Kevin

 

 

1 ACCEPTED SOLUTION

avatar
Contributor
This is the order of precedence for configurations that Spark will use: - Properties set on SparkConf or SparkContext in code - Arguments passed to spark-submit, spark-shell, or pyspark at run time - Properties set in /etc/spark/conf/spark-defaults.conf, a specified properties file or in Cloudera Manager safety valve - Environment variables exported or set in scripts * For properties that apply to all jobs, use spark-defaults.conf, for properties that are constant and specific to a single or a few applications use SparkConf or --properties-file, for properties that change between runs use command line arguments.

View solution in original post

2 REPLIES 2

avatar
New Contributor

To simplify the question:  how can I set multiple Environment Variables under "yarn.nodemanager.admin-en"?

 

avatar
Contributor
This is the order of precedence for configurations that Spark will use: - Properties set on SparkConf or SparkContext in code - Arguments passed to spark-submit, spark-shell, or pyspark at run time - Properties set in /etc/spark/conf/spark-defaults.conf, a specified properties file or in Cloudera Manager safety valve - Environment variables exported or set in scripts * For properties that apply to all jobs, use spark-defaults.conf, for properties that are constant and specific to a single or a few applications use SparkConf or --properties-file, for properties that change between runs use command line arguments.