Created on 11-28-2017 06:02 PM - edited 09-16-2022 05:34 AM
Hi,
I have been trying to set environment variable in Spark. However there seems to be problems.
I tried to use HDFS/YARN from CDH 5.12, and a standalone Spark (v2.2.0) and run together with Crail (https://github.com/zrlio/crail). However, there is error in the YARN logs saying that Crail's library path is not included in java.library.path.
...
17/11/27 10:57:50 INFO ibm.crail: crail.storage.rdma.type passive
17/11/27 10:57:50 INFO ibm.disni: creating RdmaProvider of type 'nat'
Exception in thread "dag-scheduler-event-loop" java.lang.UnsatisfiedLinkError: no disni in java.library.path
at java.lang.ClassLoader.loadLibrary(ClassLoader.java:1867)
at java.lang.Runtime.loadLibrary0(Runtime.java:870)
at java.lang.System.loadLibrary(System.java:1122)
....
I found in a post from Crail's user group that it can be fixed by setting the following variable:
LD_LIBRARY_PATH=/opt/crail/crail-1.0-bin/lib:$LD_LIBRARY_PATH
or:
spark.executor.extraJavaOptions -Djava.library.path=/opt/crail/crail-1.0-bin/lib
Here is the post:
https://groups.google.com/forum/#!topic/zrlio-users/_P5NeH3iHxE
Can you please guide where I should set the environment variable inside CDH?
I tried to set the environment variable inside ~/.bashrc and spark-env.sh. However, it didn't work, because it seems CDH will reset all enviroment variables when starting services.
I also tried setting the environment variable in all the places I can find inside CDH, including the configuration of Environments in Cloudera Management Service, YARN, and HDFS. But the problem is still not solved.
Thanks,
Kevin
Created 12-08-2017 06:41 AM
Created 11-30-2017 06:04 PM
To simplify the question: how can I set multiple Environment Variables under "yarn.nodemanager.admin-en"?
Created 12-08-2017 06:41 AM