Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

"RuntimeException: core-site.xml not found" while calling subprocess.call([])

Highlighted

"RuntimeException: core-site.xml not found" while calling subprocess.call([])

New Contributor

After upgrading (fresh installation) to the Cloudera CDH 6.1 all our ETLs (pyspark scripts) are being failed. Withing the scripts we use subprocess.call([]) to work with HDFS directories which was working on CDH 5.13 but fails to execute on current release. It throws the following error: RuntimeException: core-site.xml not found

 

See the details below

 

$ sudo -u spark pyspark --master yarn --deploy-mode client 
Python 2.7.5 (default, Oct 30 2018, 23:45:53) 
[GCC 4.8.5 20150623 (Red Hat 4.8.5-36)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
19/03/11 20:24:42 WARN lineage.LineageWriter: Lineage directory /var/log/spark/lineage doesn't exist or is not writable. Lineage for this application will be disabled.
19/03/11 20:24:43 WARN lineage.LineageWriter: Lineage directory /var/log/spark/lineage doesn't exist or is not writable. Lineage for this application will be disabled.
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/__ / .__/\_,_/_/ /_/\_\ version 2.4.0-cdh6.1.0
/_/

Using Python version 2.7.5 (default, Oct 30 2018 23:45:53)
SparkSession available as 'spark'.
>>> import subprocess
>>> subprocess.call(["hadoop", "fs", "-ls"])
WARNING: log4j.properties is not found. HADOOP_CONF_DIR may be incomplete.
Exception in thread "main" java.lang.RuntimeException: core-site.xml not found
at org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:2891)
at org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:2839)
at org.apache.hadoop.conf.Configuration.getProps(Configuration.java:2716)
at org.apache.hadoop.conf.Configuration.set(Configuration.java:1353)
at org.apache.hadoop.conf.Configuration.set(Configuration.java:1325)
at org.apache.hadoop.conf.Configuration.setBoolean(Configuration.java:1666)
at org.apache.hadoop.util.GenericOptionsParser.processGeneralOptions(GenericOptionsParser.java:339)
at org.apache.hadoop.util.GenericOptionsParser.parseGeneralOptions(GenericOptionsParser.java:569)
at org.apache.hadoop.util.GenericOptionsParser.<init>(GenericOptionsParser.java:174)
at org.apache.hadoop.util.GenericOptionsParser.<init>(GenericOptionsParser.java:156)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90)
at org.apache.hadoop.fs.FsShell.main(FsShell.java:389)

 

6 REPLIES 6

Re: &quot;RuntimeException: core-site.xml not found&quot; while calling subprocess.call([])

Expert Contributor

Hello @paramount2u,

 

It isn't able to find hadoop conf file. You can set config directory path by using as following:

export HADOOP_CONF_DIR=<put your configuration directory path>

Hope that helps.

Re: &quot;RuntimeException: core-site.xml not found&quot; while calling subprocess.call([])

New Contributor

Hi, 

 

Thank you for your response! I've tried to set HADOOP_CONF_DIR in bash before calling pyspark but it did not helped out. pyspark itself calls spark-env.sh script which overrides HADOOP_CONF_DIR variable (see the below).

 

 

HADOOP_CONF_DIR=${HADOOP_CONF_DIR:-$SPARK_CONF_DIR/yarn-conf}
HIVE_CONF_DIR=${HIVE_CONF_DIR:-/etc/hive/conf}
if [ -d "$HIVE_CONF_DIR" ]; then
  HADOOP_CONF_DIR="$HADOOP_CONF_DIR:$HIVE_CONF_DIR"
fi
export HADOOP_CONF_DIR

 

As a result, HADOOP_CONF_DIR is assigned a string which is the combination of two directories:

 

 

>>> import os
>>> os.getenv('HADOOP_CONF_DIR')
'/opt/cloudera/parcels/CDH-6.1.0-1.cdh6.1.0.p0.770702/lib/spark/conf/yarn-conf:/etc/hive/conf'

But, when I set the value manually to point to the single directory (either of the two above) subprocess routine starts working.

 

>>> os.environ['HADOOP_CONF_DIR'] = "/opt/cloudera/parcels/CDH-6.1.0-1.cdh6.1.0.p0.770702/lib/spark/conf/yarn-conf"
>>> subprocess.call(["hadoop", "fs", "-ls"])
WARNING: log4j.properties is not found. HADOOP_CONF_DIR may be incomplete.
Found 2 items
drwxr-x---   - spark spark          0 2019-03-12 15:46 .sparkStaging
drwxrwxrwt   - spark spark          0 2019-03-12 15:46 applicationHistory

So, I assume that the problem comes from the code where HIVE_CONF_DIR is appended to HADOOP_CONF_DIR. 

Can you please check whether your deployement has such lines in spark-env.sh script?

 

Re: &quot;RuntimeException: core-site.xml not found&quot; while calling subprocess.call([])

New Contributor

Is there any resolution to this? I'm seeing it as well.

Re: &quot;RuntimeException: core-site.xml not found&quot; while calling subprocess.call([])

Cloudera Employee

Please follow the below steps to make changes in spark-env.sh advanced configuration snippet

1. Login to Cloudera manager
2 Choose "SPARK2_ON_YARN-1" on Cluster
3. Choose "Configurations" tab on the displayed page
4. Search "Spark 2 Client Advanced Configuration Snippet (Safety Valve) for spark2-conf/spark-env.sh " in the Search box displayed there.
5. In "Gateway Default Group " change the value to
export HADOOP_CONF_DIR=/etc/spark/conf/yarn-conf/*:/etc/hive/conf:/etc/hive/conf.

save the configurations and restart the server to make the changes into effect.

Re: &quot;RuntimeException: core-site.xml not found&quot; while calling subprocess.call([])

Explorer

We encountered the same problem after upgrading to CDH 6.3 from 5.15. The steps outlined by Bender helped us resolve the issue, with the following small differences:

  • We modified the advanced configuration of the Spark 2 service:
    Spark Client Advanced Configuration Snippet (Safety Valve) for spark-conf/spark-env.sh
  • The following line was added:
    export HADOOP_CONF_DIR=/etc/spark/conf/yarn-conf/*:/etc/hive/conf
  • No cluster or service restart was necessary, simply re-deploying the client configs did the trick.

Re: &quot;RuntimeException: core-site.xml not found&quot; while calling subprocess.call([])

Explorer

Corrected:
export HADOOP_CONF_DIR=/etc/spark/conf/yarn-conf/*:/etc/hive/conf:$HADOOP_CONF_DIR