Support Questions
Find answers, ask questions, and share your expertise

cdsw spark2 configuration issue

New Contributor

Hi,

I'm facing issues when submitting a job/ command run using the workbench.

I followed the cdsw installation guide and did the following step that I could sum up to the following :

 

Switch to java8 on the cdh cluster and on the cdsw machine

Deploy spark2 using parcel & csd on a cdh cluster

Validate using sparkpi : everything is ok

Setting up cdsw on a dedicated node

Download & configure cdsw

cdsw init run without error

 

The problem :

I'm able to access the workbench but when I try to run any template, like analysis.R for example, I get the following message after 20 second of task inactivity :

 

 

Waiting for Spark configuration...
Have you fully deployed client configuration to your CDSW nodes?

 

And the task stay idle before getting automaticaly killed.

I looked on the spark history. No job was displayed comming from cdsw or else.

 

I was wondering if I skipped something relative to the the spark cluster configuration for cdsw

I read the cdsw installation carefully but need hints for gathering additionnal debug information or ways to configure correctly cdsw using spark2.

 

For information : I tried to copy the spark2 configuration files from the cdh worker nodes to the cdsw node for the files /etc/spark2/conf/spark-defaults.conf and /etc/spark2/conf/spark-env.sh

Unfortunately without any positive change.

 

Any feedback is welcome.

Regards

1 ACCEPTED SOLUTION

Accepted Solutions

Rising Star

Hi,

 

Did you add the dedicated CDSW host to the cluster in CM? 

 

From the documentation:

"Cloudera Data Science Workbench hosts must be added to your CDH cluster as gateway hosts, with gateway roles properly configured."

https://www.cloudera.com/documentation/data-science-workbench/latest/topics/cdsw_install.html#config...

 

Regards,

Peter

View solution in original post

2 REPLIES 2

Rising Star

Hi,

 

Did you add the dedicated CDSW host to the cluster in CM? 

 

From the documentation:

"Cloudera Data Science Workbench hosts must be added to your CDH cluster as gateway hosts, with gateway roles properly configured."

https://www.cloudera.com/documentation/data-science-workbench/latest/topics/cdsw_install.html#config...

 

Regards,

Peter

View solution in original post

New Contributor
Indeed, the cdsw host was added to the cdh manager but it just needed a spark gateway deployment

Thanks