Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Connecting to hdfs/creating the sparksession from cdsw

avatar
Contributor

Hi Team,

 

I have installed the cdsw successfully but when I was trying to run the hdfs command/trying create the sparksession from the cdsw terminal then I am getting the below error. Any idea/suggestion what exactly i am missing here from set up of point of view? Thanks in Advance!!!

Error:

hdfs dfs -put data/sample_text_file.txt /tmp clouderamaster.<domain>.com
-put: java.net.UnknownHostException:

 

clouderamaster.<domain>.com: Is my cdh master server.

cdsw.<domain>.com: is the cdsw master.(from where I am running the hdfs /sparksession from the interactive command prompt/terminal)

1 ACCEPTED SOLUTION

avatar
Super Collaborator
17 REPLIES 17

avatar
Super Collaborator

Hi,

 

We have an overlay network on top of your CDSW hosts where the pods are getting their IPs from (100.66.x.x).

 

Based on your description it seems that DNS resolution is not working from inside the container while it works on the host. This can happen when multiple nameservers are configured in /etc/resolv.conf but some of them can't resolve your clouderamaster. You could figure out what nameserver can resolve your host and drop the rest of them or make sure that all nameservers can resolve the clouderamaster. 

I like to use `dig @nameserver clouderamaster.com` command to test these.

 

Regards,

Peter

avatar
Contributor

@peter_ableda Hi Peter, When we say we need to add the dns entry details of the master host. Are we trying to say we need to add the dns entry of the clouderamaster host dns entry or the cdswmaster dns entry?

 

As of now I have added the dns entry of the cdsw master host. Also, we need to add the xtra dot(.) after the hostname as per the documentation(*.cdsw.lab.test.com./cdsw.lab.test.com.)? Sorry I am little confused with the docs.

avatar
Super Collaborator

The original issue you reported was an UnknownHostException on the clouderamaster.

 

hdfs dfs -put data/sample_text_file.txt /tmp clouderamaster.<domain>.com
-put: java.net.UnknownHostException:

 

You need to make sure that this host can be resolved (both forward/reverse) from inside a CDSW session via DNS.

 

As you can start a CDSW session and interact with it, you already configured the DNS entry for the CDSW master properly.

 

Regards,

Peter

avatar
Contributor

@peter_ableda You need to make sure that this host can be resolved (both forward/reverse) from inside a CDSW session via DNS. Is that means we need add another dns entry for the CDH master host(clouderamaster.lab.test.com) so that it can be accessiable from cdsw master host?

avatar
Super Collaborator

Yes.

avatar
Contributor

@peter_ableda Thanks Peter. Now I am able to submit the spark job from cdsw master. Does cloudera provide the user level isoloation when they access to the cdsw project/content as different user can distrub /edit the same content?

avatar
Super Collaborator

We have a collaboration page in the documentation:

https://www.cloudera.com/documentation/data-science-workbench/latest/topics/cdsw_collaborate.html

 

We also have a page about Kerberos authentication:

https://www.cloudera.com/documentation/data-science-workbench/latest/topics/cdsw_kerberos.html

 

I hope this answers your question.

 

Regards,

Peter

 

avatar
Contributor

@peter_ableda Thanks Peter for your time and have a grt day!