Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

cdsw spark context issue

avatar
New Contributor

Hi, I am trying to start a spark session via CDSW and met an error showed as below: TypeError: __init__() got an unexpected keyword argument 'auth_token' codes I used: from pyspark import SparkContext from pyspark import SparkConf from pyspark.sql import HiveContext from pyspark.sql import SQLContext conf = SparkConf().set("spark.executor.memory", "12g") \ .set("spark.yarn.executor.memoryOverhead", "3g") \ .set("spark.dynamicAllocation.initialExecutors", "2") \ .set("spark.driver.memory", "16g") \ .set("spark.kryoserializer.buffer.max", "1g") \ .set("spark.driver.cores", "32") \ .set("spark.executor.cores", "8") \ .set("spark.yarn.queue", "us9") \ .set("spark.dynamicAllocation.maxExecutors", "32") sparkContext = SparkContext.getOrCreate(conf=conf) Does anyone meet this error before or know about how to solve it? Thanks in advance.

1 ACCEPTED SOLUTION

avatar
Super Collaborator

Hi,

 

This is a known issue for the CDSW 1.3 release, please read the documentation about this:

https://www.cloudera.com/documentation/data-science-workbench/1-3-x/topics/cdsw_known_issues.html#cd...

 

I also see that you are trying to create a SparkContext object which still should work but you might be better off using the new Spark 2.x interfaces. You can see a few examples here:

https://www.cloudera.com/documentation/data-science-workbench/1-3-x/topics/cdsw_pyspark.html

 

Regards,

Peter

View solution in original post

3 REPLIES 3

avatar
Super Collaborator

Hi,

 

This is a known issue for the CDSW 1.3 release, please read the documentation about this:

https://www.cloudera.com/documentation/data-science-workbench/1-3-x/topics/cdsw_known_issues.html#cd...

 

I also see that you are trying to create a SparkContext object which still should work but you might be better off using the new Spark 2.x interfaces. You can see a few examples here:

https://www.cloudera.com/documentation/data-science-workbench/1-3-x/topics/cdsw_pyspark.html

 

Regards,

Peter

avatar
New Contributor

Thank you so much! My problem has been solved.

avatar
Contributor

@peter_ableda Sorry to ask you. Actually I have installed cdsw 1.4 on my cdsw machine and when I am trying to start the sparksession/running any hdfs commands then I am getting the error as unknowhostException with the  clouderamaster hostname. I am very new to cloudera so not sure which set up i am missing as i followed the set up related to pyspark(by importing the template while creating the project and starting the pyhton 2 env to run the pyspark job). It would be great help if you can guide me something which I am missing from my set up. Thanks in Advance!!!