09-13-2018 01:43 AM
Hi, I am trying to start a spark session via CDSW and met an error showed as below: TypeError: __init__() got an unexpected keyword argument 'auth_token' codes I used: from pyspark import SparkContext from pyspark import SparkConf from pyspark.sql import HiveContext from pyspark.sql import SQLContext conf = SparkConf().set("spark.executor.memory", "12g") \ .set("spark.yarn.executor.memoryOverhead", "3g") \ .set("spark.dynamicAllocation.initialExecutors", "2") \ .set("spark.driver.memory", "16g") \ .set("spark.kryoserializer.buffer.max", "1g") \ .set("spark.driver.cores", "32") \ .set("spark.executor.cores", "8") \ .set("spark.yarn.queue", "us9") \ .set("spark.dynamicAllocation.maxExecutors", "32") sparkContext = SparkContext.getOrCreate(conf=conf) Does anyone meet this error before or know about how to solve it? Thanks in advance.
09-13-2018 02:27 AM
This is a known issue for the CDSW 1.3 release, please read the documentation about this:
I also see that you are trying to create a SparkContext object which still should work but you might be better off using the new Spark 2.x interfaces. You can see a few examples here:
11-17-2018 09:53 AM
@peter.ableda Sorry to ask you. Actually I have installed cdsw 1.4 on my cdsw machine and when I am trying to start the sparksession/running any hdfs commands then I am getting the error as unknowhostException with the clouderamaster hostname. I am very new to cloudera so not sure which set up i am missing as i followed the set up related to pyspark(by importing the template while creating the project and starting the pyhton 2 env to run the pyspark job). It would be great help if you can guide me something which I am missing from my set up. Thanks in Advance!!!