Created on 09-13-2018 01:43 AM - edited 09-16-2022 06:42 AM
Hi, I am trying to start a spark session via CDSW and met an error showed as below: TypeError: __init__() got an unexpected keyword argument 'auth_token' codes I used: from pyspark import SparkContext from pyspark import SparkConf from pyspark.sql import HiveContext from pyspark.sql import SQLContext conf = SparkConf().set("spark.executor.memory", "12g") \ .set("spark.yarn.executor.memoryOverhead", "3g") \ .set("spark.dynamicAllocation.initialExecutors", "2") \ .set("spark.driver.memory", "16g") \ .set("spark.kryoserializer.buffer.max", "1g") \ .set("spark.driver.cores", "32") \ .set("spark.executor.cores", "8") \ .set("spark.yarn.queue", "us9") \ .set("spark.dynamicAllocation.maxExecutors", "32") sparkContext = SparkContext.getOrCreate(conf=conf) Does anyone meet this error before or know about how to solve it? Thanks in advance.
Created 09-13-2018 02:27 AM
Hi,
This is a known issue for the CDSW 1.3 release, please read the documentation about this:
I also see that you are trying to create a SparkContext object which still should work but you might be better off using the new Spark 2.x interfaces. You can see a few examples here:
https://www.cloudera.com/documentation/data-science-workbench/1-3-x/topics/cdsw_pyspark.html
Regards,
Peter
Created 09-13-2018 02:27 AM
Hi,
This is a known issue for the CDSW 1.3 release, please read the documentation about this:
I also see that you are trying to create a SparkContext object which still should work but you might be better off using the new Spark 2.x interfaces. You can see a few examples here:
https://www.cloudera.com/documentation/data-science-workbench/1-3-x/topics/cdsw_pyspark.html
Regards,
Peter
Created 09-14-2018 02:33 AM
Thank you so much! My problem has been solved.
Created 11-17-2018 09:53 AM
@peter_ableda Sorry to ask you. Actually I have installed cdsw 1.4 on my cdsw machine and when I am trying to start the sparksession/running any hdfs commands then I am getting the error as unknowhostException with the clouderamaster hostname. I am very new to cloudera so not sure which set up i am missing as i followed the set up related to pyspark(by importing the template while creating the project and starting the pyhton 2 env to run the pyspark job). It would be great help if you can guide me something which I am missing from my set up. Thanks in Advance!!!