Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

"RuntimeException: core-site.xml not found" while calling subprocess.call([])

Highlighted

"RuntimeException: core-site.xml not found" while calling subprocess.call([])

New Contributor

After upgrading (fresh installation) to the Cloudera CDH 6.1 all our ETLs (pyspark scripts) are being failed. Withing the scripts we use subprocess.call([]) to work with HDFS directories which was working on CDH 5.13 but fails to execute on current release. It throws the following error: RuntimeException: core-site.xml not found

 

See the details below

 

$ sudo -u spark pyspark --master yarn --deploy-mode client 
Python 2.7.5 (default, Oct 30 2018, 23:45:53) 
[GCC 4.8.5 20150623 (Red Hat 4.8.5-36)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
19/03/11 20:24:42 WARN lineage.LineageWriter: Lineage directory /var/log/spark/lineage doesn't exist or is not writable. Lineage for this application will be disabled.
19/03/11 20:24:43 WARN lineage.LineageWriter: Lineage directory /var/log/spark/lineage doesn't exist or is not writable. Lineage for this application will be disabled.
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/__ / .__/\_,_/_/ /_/\_\ version 2.4.0-cdh6.1.0
/_/

Using Python version 2.7.5 (default, Oct 30 2018 23:45:53)
SparkSession available as 'spark'.
>>> import subprocess
>>> subprocess.call(["hadoop", "fs", "-ls"])
WARNING: log4j.properties is not found. HADOOP_CONF_DIR may be incomplete.
Exception in thread "main" java.lang.RuntimeException: core-site.xml not found
at org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:2891)
at org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:2839)
at org.apache.hadoop.conf.Configuration.getProps(Configuration.java:2716)
at org.apache.hadoop.conf.Configuration.set(Configuration.java:1353)
at org.apache.hadoop.conf.Configuration.set(Configuration.java:1325)
at org.apache.hadoop.conf.Configuration.setBoolean(Configuration.java:1666)
at org.apache.hadoop.util.GenericOptionsParser.processGeneralOptions(GenericOptionsParser.java:339)
at org.apache.hadoop.util.GenericOptionsParser.parseGeneralOptions(GenericOptionsParser.java:569)
at org.apache.hadoop.util.GenericOptionsParser.<init>(GenericOptionsParser.java:174)
at org.apache.hadoop.util.GenericOptionsParser.<init>(GenericOptionsParser.java:156)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90)
at org.apache.hadoop.fs.FsShell.main(FsShell.java:389)

 

6 REPLIES 6

Re: &quot;RuntimeException: core-site.xml not found&quot; while calling subprocess.call([])

Expert Contributor

Hello @paramount2u,

 

It isn't able to find hadoop conf file. You can set config directory path by using as following:

export HADOOP_CONF_DIR=<put your configuration directory path>

Hope that helps.

Highlighted

Re: &quot;RuntimeException: core-site.xml not found&quot; while calling subprocess.call([])

New Contributor

Hi, 

 

Thank you for your response! I've tried to set HADOOP_CONF_DIR in bash before calling pyspark but it did not helped out. pyspark itself calls spark-env.sh script which overrides HADOOP_CONF_DIR variable (see the below).

 

 

HADOOP_CONF_DIR=${HADOOP_CONF_DIR:-$SPARK_CONF_DIR/yarn-conf}
HIVE_CONF_DIR=${HIVE_CONF_DIR:-/etc/hive/conf}
if [ -d "$HIVE_CONF_DIR" ]; then
  HADOOP_CONF_DIR="$HADOOP_CONF_DIR:$HIVE_CONF_DIR"
fi
export HADOOP_CONF_DIR

 

As a result, HADOOP_CONF_DIR is assigned a string which is the combination of two directories:

 

 

>>> import os
>>> os.getenv('HADOOP_CONF_DIR')
'/opt/cloudera/parcels/CDH-6.1.0-1.cdh6.1.0.p0.770702/lib/spark/conf/yarn-conf:/etc/hive/conf'

But, when I set the value manually to point to the single directory (either of the two above) subprocess routine starts working.

 

>>> os.environ['HADOOP_CONF_DIR'] = "/opt/cloudera/parcels/CDH-6.1.0-1.cdh6.1.0.p0.770702/lib/spark/conf/yarn-conf"
>>> subprocess.call(["hadoop", "fs", "-ls"])
WARNING: log4j.properties is not found. HADOOP_CONF_DIR may be incomplete.
Found 2 items
drwxr-x---   - spark spark          0 2019-03-12 15:46 .sparkStaging
drwxrwxrwt   - spark spark          0 2019-03-12 15:46 applicationHistory

So, I assume that the problem comes from the code where HIVE_CONF_DIR is appended to HADOOP_CONF_DIR. 

Can you please check whether your deployement has such lines in spark-env.sh script?

 

Highlighted

Re: &quot;RuntimeException: core-site.xml not found&quot; while calling subprocess.call([])

New Contributor

Is there any resolution to this? I'm seeing it as well.

Highlighted

Re: &quot;RuntimeException: core-site.xml not found&quot; while calling subprocess.call([])

Moderator

Please follow the below steps to make changes in spark-env.sh advanced configuration snippet

1. Login to Cloudera manager
2 Choose "SPARK2_ON_YARN-1" on Cluster
3. Choose "Configurations" tab on the displayed page
4. Search "Spark 2 Client Advanced Configuration Snippet (Safety Valve) for spark2-conf/spark-env.sh " in the Search box displayed there.
5. In "Gateway Default Group " change the value to
export HADOOP_CONF_DIR=/etc/spark/conf/yarn-conf/*:/etc/hive/conf:/etc/hive/conf.

save the configurations and restart the server to make the changes into effect.


Ferenc Erdelyi, Technical Solutions Manager

Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.

Learn more about the Cloudera Community:

Highlighted

Re: &quot;RuntimeException: core-site.xml not found&quot; while calling subprocess.call([])

Explorer

We encountered the same problem after upgrading to CDH 6.3 from 5.15. The steps outlined by Bender helped us resolve the issue, with the following small differences:

  • We modified the advanced configuration of the Spark 2 service:
    Spark Client Advanced Configuration Snippet (Safety Valve) for spark-conf/spark-env.sh
  • The following line was added:
    export HADOOP_CONF_DIR=/etc/spark/conf/yarn-conf/*:/etc/hive/conf
  • No cluster or service restart was necessary, simply re-deploying the client configs did the trick.
Highlighted

Re: &quot;RuntimeException: core-site.xml not found&quot; while calling subprocess.call([])

Explorer

Corrected:
export HADOOP_CONF_DIR=/etc/spark/conf/yarn-conf/*:/etc/hive/conf:$HADOOP_CONF_DIR

Don't have an account?
Coming from Hortonworks? Activate your account here