<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question PySpark - Error initializing SparkContext in Support Questions</title>
    <link>https://community.cloudera.com/t5/Support-Questions/PySpark-Error-initializing-SparkContext/m-p/51858#M23510</link>
    <description>&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;We are running into issues when we launch PySpark (with or without Yarn).&lt;/P&gt;&lt;P&gt;It seems to be looking for hive-site.xml file which we already copied to spark configuration path but I am not sure if there are any specific parameters that should be part of.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;[apps@devdm003.dev1 ~]$ pyspark --master yarn --verbose&lt;BR /&gt;WARNING: User-defined SPARK_HOME (/opt/spark) overrides detected (/opt/cloudera/parcels/CDH-5.5.4-1.cdh5.5.4.p0.9/lib/spark).&lt;BR /&gt;WARNING: Running pyspark from user-defined location.&lt;BR /&gt;Python 2.7.8 (default, Oct 22 2016, 09:02:55)&lt;BR /&gt;[GCC 4.4.7 20120313 (Red Hat 4.4.7-17)] on linux2&lt;BR /&gt;Type "help", "copyright", "credits" or "license" for more information.&lt;BR /&gt;Using properties file: /opt/cloudera/parcels/CDH-5.5.4-1.cdh5.5.4.p0.9/lib/spark/conf/spark-defaults.conf&lt;BR /&gt;Adding default property: spark.serializer=org.apache.spark.serializer.KryoSerializer&lt;BR /&gt;Adding default property: spark.yarn.jars=hdfs://devdm001.dev1.turn.com:8020/user/spark/spark-2.1-bin-hadoop/*&lt;BR /&gt;Adding default property: spark.eventLog.enabled=true&lt;BR /&gt;Adding default property: spark.shuffle.service.enabled=true&lt;BR /&gt;Adding default property: spark.driver.extraLibraryPath=/opt/cloudera/parcels/CDH-5.5.4-1.cdh5.5.4.p0.9/lib/hadoop/lib/native&lt;BR /&gt;Adding default property: spark.yarn.historyServer.address=&lt;A href="http://devdm004.dev1.turn.com:18088" target="_blank"&gt;http://devdm004.dev1.turn.com:18088&lt;/A&gt;&lt;BR /&gt;Adding default property: spark.dynamicAllocation.schedulerBacklogTimeout=1&lt;BR /&gt;Adding default property: spark.yarn.am.extraLibraryPath=/opt/cloudera/parcels/CDH-5.5.4-1.cdh5.5.4.p0.9/lib/hadoop/lib/native&lt;BR /&gt;Adding default property: spark.yarn.config.gatewayPath=/opt/cloudera/parcels&lt;BR /&gt;Adding default property: spark.yarn.config.replacementPath={{HADOOP_COMMON_HOME}}/../../..&lt;BR /&gt;Adding default property: spark.shuffle.service.port=7337&lt;BR /&gt;Adding default property: spark.master=yarn&lt;BR /&gt;Adding default property: spark.authenticate=false&lt;BR /&gt;Adding default property: spark.executor.extraLibraryPath=/opt/cloudera/parcels/CDH-5.5.4-1.cdh5.5.4.p0.9/lib/hadoop/lib/native&lt;BR /&gt;Adding default property: spark.eventLog.dir=hdfs://devdm001.dev1.turn.com:8020/user/spark/applicationHistory&lt;BR /&gt;Adding default property: spark.dynamicAllocation.enabled=true&lt;BR /&gt;Adding default property: spark.dynamicAllocation.minExecutors=0&lt;BR /&gt;Adding default property: spark.dynamicAllocation.executorIdleTimeout=60&lt;BR /&gt;Parsed arguments:&lt;BR /&gt;master yarn&lt;BR /&gt;deployMode null&lt;BR /&gt;executorMemory null&lt;BR /&gt;executorCores null&lt;BR /&gt;totalExecutorCores null&lt;BR /&gt;propertiesFile /opt/cloudera/parcels/CDH-5.5.4-1.cdh5.5.4.p0.9/lib/spark/conf/spark-defaults.conf&lt;BR /&gt;driverMemory null&lt;BR /&gt;driverCores null&lt;BR /&gt;driverExtraClassPath null&lt;BR /&gt;driverExtraLibraryPath /opt/cloudera/parcels/CDH-5.5.4-1.cdh5.5.4.p0.9/lib/hadoop/lib/native&lt;BR /&gt;driverExtraJavaOptions null&lt;BR /&gt;supervise false&lt;BR /&gt;queue null&lt;BR /&gt;numExecutors null&lt;BR /&gt;files null&lt;BR /&gt;pyFiles null&lt;BR /&gt;archives null&lt;BR /&gt;mainClass null&lt;BR /&gt;primaryResource pyspark-shell&lt;BR /&gt;name PySparkShell&lt;BR /&gt;childArgs []&lt;BR /&gt;jars null&lt;BR /&gt;packages null&lt;BR /&gt;packagesExclusions null&lt;BR /&gt;repositories null&lt;BR /&gt;verbose true&lt;/P&gt;&lt;P&gt;Spark properties used, including those specified through&lt;BR /&gt;--conf and those from the properties file /opt/cloudera/parcels/CDH-5.5.4-1.cdh5.5.4.p0.9/lib/spark/conf/spark-defaults.conf:&lt;BR /&gt;spark.executor.extraLibraryPath -&amp;gt; /opt/cloudera/parcels/CDH-5.5.4-1.cdh5.5.4.p0.9/lib/hadoop/lib/native&lt;BR /&gt;spark.yarn.jars -&amp;gt; hdfs://devdm001.dev1.turn.com:8020/user/spark/spark-2.1-bin-hadoop/*&lt;BR /&gt;spark.driver.extraLibraryPath -&amp;gt; /opt/cloudera/parcels/CDH-5.5.4-1.cdh5.5.4.p0.9/lib/hadoop/lib/native&lt;BR /&gt;spark.authenticate -&amp;gt; false&lt;BR /&gt;spark.yarn.historyServer.address -&amp;gt; &lt;A href="http://devdm004.dev1.turn.com:18088" target="_blank"&gt;http://devdm004.dev1.turn.com:18088&lt;/A&gt;&lt;BR /&gt;spark.yarn.am.extraLibraryPath -&amp;gt; /opt/cloudera/parcels/CDH-5.5.4-1.cdh5.5.4.p0.9/lib/hadoop/lib/native&lt;BR /&gt;spark.eventLog.enabled -&amp;gt; true&lt;BR /&gt;spark.dynamicAllocation.schedulerBacklogTimeout -&amp;gt; 1&lt;BR /&gt;spark.yarn.config.gatewayPath -&amp;gt; /opt/cloudera/parcels&lt;BR /&gt;spark.serializer -&amp;gt; org.apache.spark.serializer.KryoSerializer&lt;BR /&gt;spark.dynamicAllocation.executorIdleTimeout -&amp;gt; 60&lt;BR /&gt;spark.dynamicAllocation.minExecutors -&amp;gt; 0&lt;BR /&gt;spark.shuffle.service.enabled -&amp;gt; true&lt;BR /&gt;spark.yarn.config.replacementPath -&amp;gt; {{HADOOP_COMMON_HOME}}/../../..&lt;BR /&gt;spark.shuffle.service.port -&amp;gt; 7337&lt;BR /&gt;spark.eventLog.dir -&amp;gt; hdfs://devdm001.dev1.turn.com:8020/user/spark/applicationHistory&lt;BR /&gt;spark.master -&amp;gt; yarn&lt;BR /&gt;spark.dynamicAllocation.enabled -&amp;gt; true&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;Main class:&lt;BR /&gt;org.apache.spark.api.python.PythonGatewayServer&lt;BR /&gt;Arguments:&lt;/P&gt;&lt;P&gt;System properties:&lt;BR /&gt;spark.executor.extraLibraryPath -&amp;gt; /opt/cloudera/parcels/CDH-5.5.4-1.cdh5.5.4.p0.9/lib/hadoop/lib/native&lt;BR /&gt;spark.driver.extraLibraryPath -&amp;gt; /opt/cloudera/parcels/CDH-5.5.4-1.cdh5.5.4.p0.9/lib/hadoop/lib/native&lt;BR /&gt;spark.yarn.jars -&amp;gt; hdfs://devdm001.dev1.turn.com:8020/user/spark/spark-2.1-bin-hadoop/*&lt;BR /&gt;spark.authenticate -&amp;gt; false&lt;BR /&gt;spark.yarn.historyServer.address -&amp;gt; &lt;A href="http://devdm004.dev1.turn.com:18088" target="_blank"&gt;http://devdm004.dev1.turn.com:18088&lt;/A&gt;&lt;BR /&gt;spark.yarn.am.extraLibraryPath -&amp;gt; /opt/cloudera/parcels/CDH-5.5.4-1.cdh5.5.4.p0.9/lib/hadoop/lib/native&lt;BR /&gt;spark.eventLog.enabled -&amp;gt; true&lt;BR /&gt;spark.dynamicAllocation.schedulerBacklogTimeout -&amp;gt; 1&lt;BR /&gt;SPARK_SUBMIT -&amp;gt; true&lt;BR /&gt;spark.yarn.config.gatewayPath -&amp;gt; /opt/cloudera/parcels&lt;BR /&gt;spark.serializer -&amp;gt; org.apache.spark.serializer.KryoSerializer&lt;BR /&gt;spark.shuffle.service.enabled -&amp;gt; true&lt;BR /&gt;spark.dynamicAllocation.minExecutors -&amp;gt; 0&lt;BR /&gt;spark.dynamicAllocation.executorIdleTimeout -&amp;gt; 60&lt;BR /&gt;spark.app.name -&amp;gt; PySparkShell&lt;BR /&gt;spark.yarn.config.replacementPath -&amp;gt; {{HADOOP_COMMON_HOME}}/../../..&lt;BR /&gt;spark.submit.deployMode -&amp;gt; client&lt;BR /&gt;spark.shuffle.service.port -&amp;gt; 7337&lt;BR /&gt;spark.eventLog.dir -&amp;gt; hdfs://devdm001.dev1.turn.com:8020/user/spark/applicationHistory&lt;BR /&gt;spark.master -&amp;gt; yarn&lt;BR /&gt;spark.yarn.isPython -&amp;gt; true&lt;BR /&gt;spark.dynamicAllocation.enabled -&amp;gt; true&lt;BR /&gt;Classpath elements:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;log4j:ERROR Could not find value for key log4j.appender.WARN&lt;BR /&gt;log4j:ERROR Could not instantiate appender named "WARN".&lt;BR /&gt;log4j:ERROR Could not find value for key log4j.appender.DEBUG&lt;BR /&gt;log4j:ERROR Could not instantiate appender named "DEBUG".&lt;BR /&gt;Setting default log level to "WARN".&lt;BR /&gt;To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).&lt;BR /&gt;SLF4J: Class path contains multiple SLF4J bindings.&lt;BR /&gt;SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/CDH-5.5.4-1.cdh5.5.4.p0.9/jars/avro-tools-1.7.6-cdh5.5.4.jar!/org/slf4j/impl/StaticLoggerBinder.class]&lt;BR /&gt;SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/CDH-5.5.4-1.cdh5.5.4.p0.9/jars/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]&lt;BR /&gt;SLF4J: Found binding in [jar:file:/server/turn/deploy/160622/turn/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]&lt;BR /&gt;SLF4J: See &lt;A href="http://www.slf4j.org/codes.html#multiple_bindings" target="_blank"&gt;http://www.slf4j.org/codes.html#multiple_bindings&lt;/A&gt; for an explanation.&lt;BR /&gt;Traceback (most recent call last):&lt;BR /&gt;File "/opt/spark/python/pyspark/shell.py", line 43, in &amp;lt;module&amp;gt;&lt;BR /&gt;spark = SparkSession.builder\&lt;BR /&gt;File "/opt/spark/python/pyspark/sql/session.py", line 179, in getOrCreate&lt;BR /&gt;session._jsparkSession.sessionState().conf().setConfString(key, value)&lt;BR /&gt;File "/opt/spark/python/lib/py4j-0.10.4-src.zip/py4j/java_gateway.py", line 1133, in __call__&lt;BR /&gt;File "/opt/spark/python/pyspark/sql/utils.py", line 79, in deco&lt;BR /&gt;raise IllegalArgumentException(s.split(': ', 1)[1], stackTrace)&lt;BR /&gt;pyspark.sql.utils.IllegalArgumentException: u"Error while instantiating 'org.apache.spark.sql.hive.HiveSessionState':"&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;We installed Spark 2.1 for business reasons and updated SPARK_HOME variable in safety valve.&lt;/P&gt;&lt;P&gt;(Ensured SPARK_HOME is set early in spark-env.sh so other PATH variables are set properly).&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I also learnt that there is no hive-site.xml dependency with spark 2.1 which confuses me more for reasons it is looking into.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Did anyone face similar issue, any suggestions? This is a linux environment&amp;nbsp;running CDH5.5.4&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Fri, 16 Sep 2022 11:12:30 GMT</pubDate>
    <dc:creator>Murthy</dc:creator>
    <dc:date>2022-09-16T11:12:30Z</dc:date>
  </channel>
</rss>

