Created on 03-11-2019 09:35 AM - edited 09-16-2022 07:13 AM
After upgrading (fresh installation) to the Cloudera CDH 6.1 all our ETLs (pyspark scripts) are being failed. Withing the scripts we use subprocess.call([]) to work with HDFS directories which was working on CDH 5.13 but fails to execute on current release. It throws the following error: RuntimeException: core-site.xml not found
See the details below
$ sudo -u spark pyspark --master yarn --deploy-mode client Python 2.7.5 (default, Oct 30 2018, 23:45:53) [GCC 4.8.5 20150623 (Red Hat 4.8.5-36)] on linux2 Type "help", "copyright", "credits" or "license" for more information. Setting default log level to "WARN". To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel). 19/03/11 20:24:42 WARN lineage.LineageWriter: Lineage directory /var/log/spark/lineage doesn't exist or is not writable. Lineage for this application will be disabled. 19/03/11 20:24:43 WARN lineage.LineageWriter: Lineage directory /var/log/spark/lineage doesn't exist or is not writable. Lineage for this application will be disabled. Welcome to ____ __ / __/__ ___ _____/ /__ _\ \/ _ \/ _ `/ __/ '_/ /__ / .__/\_,_/_/ /_/\_\ version 2.4.0-cdh6.1.0 /_/ Using Python version 2.7.5 (default, Oct 30 2018 23:45:53) SparkSession available as 'spark'. >>> import subprocess >>> subprocess.call(["hadoop", "fs", "-ls"]) WARNING: log4j.properties is not found. HADOOP_CONF_DIR may be incomplete. Exception in thread "main" java.lang.RuntimeException: core-site.xml not found at org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:2891) at org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:2839) at org.apache.hadoop.conf.Configuration.getProps(Configuration.java:2716) at org.apache.hadoop.conf.Configuration.set(Configuration.java:1353) at org.apache.hadoop.conf.Configuration.set(Configuration.java:1325) at org.apache.hadoop.conf.Configuration.setBoolean(Configuration.java:1666) at org.apache.hadoop.util.GenericOptionsParser.processGeneralOptions(GenericOptionsParser.java:339) at org.apache.hadoop.util.GenericOptionsParser.parseGeneralOptions(GenericOptionsParser.java:569) at org.apache.hadoop.util.GenericOptionsParser.<init>(GenericOptionsParser.java:174) at org.apache.hadoop.util.GenericOptionsParser.<init>(GenericOptionsParser.java:156) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90) at org.apache.hadoop.fs.FsShell.main(FsShell.java:389)