Support Questions

THUKevin · ‎11-28-2017

Hi,

I have been trying to set environment variable in Spark. However there seems to be problems.

I tried to use HDFS/YARN from CDH 5.12, and a standalone Spark (v2.2.0) and run together with Crail (https://github.com/zrlio/crail). However, there is error in the YARN logs saying that Crail's library path is not included in java.library.path.

...

17/11/27 10:57:50 INFO ibm.crail: crail.storage.rdma.type passive

17/11/27 10:57:50 INFO ibm.disni: creating RdmaProvider of type 'nat'
Exception in thread "dag-scheduler-event-loop" java.lang.UnsatisfiedLinkError: no disni in java.library.path

at java.lang.ClassLoader.loadLibrary(ClassLoader.java:1867)
at java.lang.Runtime.loadLibrary0(Runtime.java:870)
at java.lang.System.loadLibrary(System.java:1122)

....

I found in a post from Crail's user group that it can be fixed by setting the following variable:

LD_LIBRARY_PATH=/opt/crail/crail-1.0-bin/lib:$LD_LIBRARY_PATH

or:

spark.executor.extraJavaOptions -Djava.library.path=/opt/crail/crail-1.0-bin/lib

Here is the post:

https://groups.google.com/forum/#!topic/zrlio-users/_P5NeH3iHxE

Can you please guide where I should set the environment variable inside CDH?

I tried to set the environment variable inside ~/.bashrc and spark-env.sh. However, it didn't work, because it seems CDH will reset all enviroment variables when starting services.

I also tried setting the environment variable in all the places I can find inside CDH, including the configuration of Environments in Cloudera Management Service, YARN, and HDFS. But the problem is still not solved.

Thanks,

Kevin

GaryWO · ‎12-08-2017

This is the order of precedence for configurations that Spark will use: - Properties set on SparkConf or SparkContext in code - Arguments passed to spark-submit, spark-shell, or pyspark at run time - Properties set in /etc/spark/conf/spark-defaults.conf, a specified properties file or in Cloudera Manager safety valve - Environment variables exported or set in scripts * For properties that apply to all jobs, use spark-defaults.conf, for properties that are constant and specific to a single or a few applications use SparkConf or --properties-file, for properties that change between runs use command line arguments.

View solution in original post

THUKevin · ‎11-30-2017

To simplify the question: how can I set multiple Environment Variables under "yarn.nodemanager.admin-en"?

GaryWO · ‎12-08-2017

This is the order of precedence for configurations that Spark will use: - Properties set on SparkConf or SparkContext in code - Arguments passed to spark-submit, spark-shell, or pyspark at run time - Properties set in /etc/spark/conf/spark-defaults.conf, a specified properties file or in Cloudera Manager safety valve - Environment variables exported or set in scripts * For properties that apply to all jobs, use spark-defaults.conf, for properties that are constant and specific to a single or a few applications use SparkConf or --properties-file, for properties that change between runs use command line arguments.

Cloudera Community

Support Questions

Set environment variable in CDH for Spark executors