- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
cdsw spark context issue
Created on ‎09-13-2018 01:43 AM - edited ‎09-16-2022 06:42 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi, I am trying to start a spark session via CDSW and met an error showed as below: TypeError: __init__() got an unexpected keyword argument 'auth_token' codes I used: from pyspark import SparkContext from pyspark import SparkConf from pyspark.sql import HiveContext from pyspark.sql import SQLContext conf = SparkConf().set("spark.executor.memory", "12g") \ .set("spark.yarn.executor.memoryOverhead", "3g") \ .set("spark.dynamicAllocation.initialExecutors", "2") \ .set("spark.driver.memory", "16g") \ .set("spark.kryoserializer.buffer.max", "1g") \ .set("spark.driver.cores", "32") \ .set("spark.executor.cores", "8") \ .set("spark.yarn.queue", "us9") \ .set("spark.dynamicAllocation.maxExecutors", "32") sparkContext = SparkContext.getOrCreate(conf=conf) Does anyone meet this error before or know about how to solve it? Thanks in advance.
Created ‎09-13-2018 02:27 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
This is a known issue for the CDSW 1.3 release, please read the documentation about this:
I also see that you are trying to create a SparkContext object which still should work but you might be better off using the new Spark 2.x interfaces. You can see a few examples here:
https://www.cloudera.com/documentation/data-science-workbench/1-3-x/topics/cdsw_pyspark.html
Regards,
Peter
Created ‎09-13-2018 02:27 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
This is a known issue for the CDSW 1.3 release, please read the documentation about this:
I also see that you are trying to create a SparkContext object which still should work but you might be better off using the new Spark 2.x interfaces. You can see a few examples here:
https://www.cloudera.com/documentation/data-science-workbench/1-3-x/topics/cdsw_pyspark.html
Regards,
Peter
Created ‎09-14-2018 02:33 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thank you so much! My problem has been solved.
Created ‎11-17-2018 09:53 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@peter_ableda Sorry to ask you. Actually I have installed cdsw 1.4 on my cdsw machine and when I am trying to start the sparksession/running any hdfs commands then I am getting the error as unknowhostException with the clouderamaster hostname. I am very new to cloudera so not sure which set up i am missing as i followed the set up related to pyspark(by importing the template while creating the project and starting the pyhton 2 env to run the pyspark job). It would be great help if you can guide me something which I am missing from my set up. Thanks in Advance!!!
