Archives of Support Questions (Read Only)

This is an archived board for historical reference. Information and links may no longer be available or relevant
Announcements
This board is archived and read-only for historical reference. To ask a new question, please post a new topic on the appropriate active board.

cdsw spark context issue

avatar
Visitor

Hi, I am trying to start a spark session via CDSW and met an error showed as below: TypeError: __init__() got an unexpected keyword argument 'auth_token' codes I used: from pyspark import SparkContext from pyspark import SparkConf from pyspark.sql import HiveContext from pyspark.sql import SQLContext conf = SparkConf().set("spark.executor.memory", "12g") \ .set("spark.yarn.executor.memoryOverhead", "3g") \ .set("spark.dynamicAllocation.initialExecutors", "2") \ .set("spark.driver.memory", "16g") \ .set("spark.kryoserializer.buffer.max", "1g") \ .set("spark.driver.cores", "32") \ .set("spark.executor.cores", "8") \ .set("spark.yarn.queue", "us9") \ .set("spark.dynamicAllocation.maxExecutors", "32") sparkContext = SparkContext.getOrCreate(conf=conf) Does anyone meet this error before or know about how to solve it? Thanks in advance.

1 ACCEPTED SOLUTION

avatar
Super Collaborator

Hi,

 

This is a known issue for the CDSW 1.3 release, please read the documentation about this:

https://www.cloudera.com/documentation/data-science-workbench/1-3-x/topics/cdsw_known_issues.html#cd...

 

I also see that you are trying to create a SparkContext object which still should work but you might be better off using the new Spark 2.x interfaces. You can see a few examples here:

https://www.cloudera.com/documentation/data-science-workbench/1-3-x/topics/cdsw_pyspark.html

 

Regards,

Peter

View solution in original post

3 REPLIES 3

avatar
Super Collaborator

Hi,

 

This is a known issue for the CDSW 1.3 release, please read the documentation about this:

https://www.cloudera.com/documentation/data-science-workbench/1-3-x/topics/cdsw_known_issues.html#cd...

 

I also see that you are trying to create a SparkContext object which still should work but you might be better off using the new Spark 2.x interfaces. You can see a few examples here:

https://www.cloudera.com/documentation/data-science-workbench/1-3-x/topics/cdsw_pyspark.html

 

Regards,

Peter

avatar
Visitor

Thank you so much! My problem has been solved.

avatar
Contributor

@peter_ableda Sorry to ask you. Actually I have installed cdsw 1.4 on my cdsw machine and when I am trying to start the sparksession/running any hdfs commands then I am getting the error as unknowhostException with the  clouderamaster hostname. I am very new to cloudera so not sure which set up i am missing as i followed the set up related to pyspark(by importing the template while creating the project and starting the pyhton 2 env to run the pyspark job). It would be great help if you can guide me something which I am missing from my set up. Thanks in Advance!!!