Support Questions
Find answers, ask questions, and share your expertise

PySpark - Error initializing SparkContext

New Contributor

 

We are running into issues when we launch PySpark (with or without Yarn).

It seems to be looking for hive-site.xml file which we already copied to spark configuration path but I am not sure if there are any specific parameters that should be part of.

 

 

 

 

 

 

[apps@devdm003.dev1 ~]$ pyspark --master yarn --verbose
WARNING: User-defined SPARK_HOME (/opt/spark) overrides detected (/opt/cloudera/parcels/CDH-5.5.4-1.cdh5.5.4.p0.9/lib/spark).
WARNING: Running pyspark from user-defined location.
Python 2.7.8 (default, Oct 22 2016, 09:02:55)
[GCC 4.4.7 20120313 (Red Hat 4.4.7-17)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
Using properties file: /opt/cloudera/parcels/CDH-5.5.4-1.cdh5.5.4.p0.9/lib/spark/conf/spark-defaults.conf
Adding default property: spark.serializer=org.apache.spark.serializer.KryoSerializer
Adding default property: spark.yarn.jars=hdfs://devdm001.dev1.turn.com:8020/user/spark/spark-2.1-bin-hadoop/*
Adding default property: spark.eventLog.enabled=true
Adding default property: spark.shuffle.service.enabled=true
Adding default property: spark.driver.extraLibraryPath=/opt/cloudera/parcels/CDH-5.5.4-1.cdh5.5.4.p0.9/lib/hadoop/lib/native
Adding default property: spark.yarn.historyServer.address=http://devdm004.dev1.turn.com:18088
Adding default property: spark.dynamicAllocation.schedulerBacklogTimeout=1
Adding default property: spark.yarn.am.extraLibraryPath=/opt/cloudera/parcels/CDH-5.5.4-1.cdh5.5.4.p0.9/lib/hadoop/lib/native
Adding default property: spark.yarn.config.gatewayPath=/opt/cloudera/parcels
Adding default property: spark.yarn.config.replacementPath={{HADOOP_COMMON_HOME}}/../../..
Adding default property: spark.shuffle.service.port=7337
Adding default property: spark.master=yarn
Adding default property: spark.authenticate=false
Adding default property: spark.executor.extraLibraryPath=/opt/cloudera/parcels/CDH-5.5.4-1.cdh5.5.4.p0.9/lib/hadoop/lib/native
Adding default property: spark.eventLog.dir=hdfs://devdm001.dev1.turn.com:8020/user/spark/applicationHistory
Adding default property: spark.dynamicAllocation.enabled=true
Adding default property: spark.dynamicAllocation.minExecutors=0
Adding default property: spark.dynamicAllocation.executorIdleTimeout=60
Parsed arguments:
master yarn
deployMode null
executorMemory null
executorCores null
totalExecutorCores null
propertiesFile /opt/cloudera/parcels/CDH-5.5.4-1.cdh5.5.4.p0.9/lib/spark/conf/spark-defaults.conf
driverMemory null
driverCores null
driverExtraClassPath null
driverExtraLibraryPath /opt/cloudera/parcels/CDH-5.5.4-1.cdh5.5.4.p0.9/lib/hadoop/lib/native
driverExtraJavaOptions null
supervise false
queue null
numExecutors null
files null
pyFiles null
archives null
mainClass null
primaryResource pyspark-shell
name PySparkShell
childArgs []
jars null
packages null
packagesExclusions null
repositories null
verbose true

Spark properties used, including those specified through
--conf and those from the properties file /opt/cloudera/parcels/CDH-5.5.4-1.cdh5.5.4.p0.9/lib/spark/conf/spark-defaults.conf:
spark.executor.extraLibraryPath -> /opt/cloudera/parcels/CDH-5.5.4-1.cdh5.5.4.p0.9/lib/hadoop/lib/native
spark.yarn.jars -> hdfs://devdm001.dev1.turn.com:8020/user/spark/spark-2.1-bin-hadoop/*
spark.driver.extraLibraryPath -> /opt/cloudera/parcels/CDH-5.5.4-1.cdh5.5.4.p0.9/lib/hadoop/lib/native
spark.authenticate -> false
spark.yarn.historyServer.address -> http://devdm004.dev1.turn.com:18088
spark.yarn.am.extraLibraryPath -> /opt/cloudera/parcels/CDH-5.5.4-1.cdh5.5.4.p0.9/lib/hadoop/lib/native
spark.eventLog.enabled -> true
spark.dynamicAllocation.schedulerBacklogTimeout -> 1
spark.yarn.config.gatewayPath -> /opt/cloudera/parcels
spark.serializer -> org.apache.spark.serializer.KryoSerializer
spark.dynamicAllocation.executorIdleTimeout -> 60
spark.dynamicAllocation.minExecutors -> 0
spark.shuffle.service.enabled -> true
spark.yarn.config.replacementPath -> {{HADOOP_COMMON_HOME}}/../../..
spark.shuffle.service.port -> 7337
spark.eventLog.dir -> hdfs://devdm001.dev1.turn.com:8020/user/spark/applicationHistory
spark.master -> yarn
spark.dynamicAllocation.enabled -> true


Main class:
org.apache.spark.api.python.PythonGatewayServer
Arguments:

System properties:
spark.executor.extraLibraryPath -> /opt/cloudera/parcels/CDH-5.5.4-1.cdh5.5.4.p0.9/lib/hadoop/lib/native
spark.driver.extraLibraryPath -> /opt/cloudera/parcels/CDH-5.5.4-1.cdh5.5.4.p0.9/lib/hadoop/lib/native
spark.yarn.jars -> hdfs://devdm001.dev1.turn.com:8020/user/spark/spark-2.1-bin-hadoop/*
spark.authenticate -> false
spark.yarn.historyServer.address -> http://devdm004.dev1.turn.com:18088
spark.yarn.am.extraLibraryPath -> /opt/cloudera/parcels/CDH-5.5.4-1.cdh5.5.4.p0.9/lib/hadoop/lib/native
spark.eventLog.enabled -> true
spark.dynamicAllocation.schedulerBacklogTimeout -> 1
SPARK_SUBMIT -> true
spark.yarn.config.gatewayPath -> /opt/cloudera/parcels
spark.serializer -> org.apache.spark.serializer.KryoSerializer
spark.shuffle.service.enabled -> true
spark.dynamicAllocation.minExecutors -> 0
spark.dynamicAllocation.executorIdleTimeout -> 60
spark.app.name -> PySparkShell
spark.yarn.config.replacementPath -> {{HADOOP_COMMON_HOME}}/../../..
spark.submit.deployMode -> client
spark.shuffle.service.port -> 7337
spark.eventLog.dir -> hdfs://devdm001.dev1.turn.com:8020/user/spark/applicationHistory
spark.master -> yarn
spark.yarn.isPython -> true
spark.dynamicAllocation.enabled -> true
Classpath elements:

 

log4j:ERROR Could not find value for key log4j.appender.WARN
log4j:ERROR Could not instantiate appender named "WARN".
log4j:ERROR Could not find value for key log4j.appender.DEBUG
log4j:ERROR Could not instantiate appender named "DEBUG".
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/CDH-5.5.4-1.cdh5.5.4.p0.9/jars/avro-tools-1.7.6-cdh5.5.4.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/CDH-5.5.4-1.cdh5.5.4.p0.9/jars/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/server/turn/deploy/160622/turn/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
Traceback (most recent call last):
File "/opt/spark/python/pyspark/shell.py", line 43, in <module>
spark = SparkSession.builder\
File "/opt/spark/python/pyspark/sql/session.py", line 179, in getOrCreate
session._jsparkSession.sessionState().conf().setConfString(key, value)
File "/opt/spark/python/lib/py4j-0.10.4-src.zip/py4j/java_gateway.py", line 1133, in __call__
File "/opt/spark/python/pyspark/sql/utils.py", line 79, in deco
raise IllegalArgumentException(s.split(': ', 1)[1], stackTrace)
pyspark.sql.utils.IllegalArgumentException: u"Error while instantiating 'org.apache.spark.sql.hive.HiveSessionState':"

 

 

 

 

 

We installed Spark 2.1 for business reasons and updated SPARK_HOME variable in safety valve.

(Ensured SPARK_HOME is set early in spark-env.sh so other PATH variables are set properly).

 

I also learnt that there is no hive-site.xml dependency with spark 2.1 which confuses me more for reasons it is looking into.

 

Did anyone face similar issue, any suggestions? This is a linux environment running CDH5.5.4

 

 

 

 

4 REPLIES 4

Contributor

By Spark 2.1 do you mean Cloudera Spark 2.0 Release 1 or Apache Spark 2.1 ? 

 

Regarding Cloudera Spark 2.0 Release 1 or Release 2, I would like to tell that minimum required CDH version is CDH 5.7.x but you are on CDH 5.5.4. 

New Contributor
It is apache spark 2.1

One Reference used (similar to our situation) :


https://www.linkedin.com/pulse/running-spark-2xx-cloudera-hadoop-distro-cdh-deenar-toraskar-cfa

New Contributor

The problem seems to be with configuration rather than dependency, I am not sure what configuration is missing. 

 

Here is my configuration  :

 

spark-defaults.conf :

 

spark.authenticate=false
spark.dynamicAllocation.enabled=true
spark.dynamicAllocation.executorIdleTimeout=60
spark.dynamicAllocation.minExecutors=0
spark.dynamicAllocation.schedulerBacklogTimeout=1
spark.eventLog.dir=hdfs://dtest.turn.com:8020/user/spark/applicationHistory
spark.eventLog.enabled=true
spark.serializer=org.apache.spark.serializer.KryoSerializer
spark.shuffle.service.enabled=true
spark.shuffle.service.port=7337
spark.master=yarn
spark.yarn.jars=hdfs://dtest.turn.com:8020/user/spark/spark-2.1-bin-hadoop/*
spark.yarn.historyServer.address=http://dtest.turn.com:18088
spark.driver.extraLibraryPath=/opt/cloudera/parcels/CDH-5.5.4-1.cdh5.5.4.p0.9/lib/hadoop/lib/native
spark.executor.extraLibraryPath=/opt/cloudera/parcels/CDH-5.5.4-1.cdh5.5.4.p0.9/lib/hadoop/lib/native
spark.yarn.am.extraLibraryPath=/opt/cloudera/parcels/CDH-5.5.4-1.cdh5.5.4.p0.9/lib/hadoop/lib/native
spark.yarn.config.gatewayPath=/opt/cloudera/parcels
spark.yarn.config.replacementPath={{HADOOP_COMMON_HOME}}/../../..

 

 

 

 

New Contributor

while i running this programming i got these error can anyone help me with this

[cloudera@quickstart ~]$ spark-shell --master yarn-client
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel).
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/lib/zookeeper/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/lib/flume-ng/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/lib/parquet/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/lib/avro/avro-tools-1.7.6-cdh5.13.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 1.6.0
/_/

Using Scala version 2.10.5 (Java HotSpot(TM) 64-Bit Server VM, Java 1.7.0_67)
Type in expressions to have them evaluated.
Type :help for more information.
20/02/05 22:03:26 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
20/02/05 22:03:42 WARN shortcircuit.DomainSocketFactory: The short-circuit local reads feature cannot be used because libhadoop cannot be loaded.
Spark context available as sc (master = yarn-client, app id = application_1580968178673_0001).
SQL context available as sqlContext.