Created 04-11-2022 09:11 PM
Hi,
I have successfully installed CDP private cloud trial version on AWS using below instructions.
https://www.cloudera.com/tutorials/how-to-create-a-cdp-private-cloud-base-development-cluster.html
Question:
How to access the CDP components (spark, hive, etc) from my local laptop through CLI?
As part of installation, 4 hosts were created in aws, but I do not know how to access those hosts through CLI from my laptop. I am using ubuntu on my laptop.
Also I am able to access all the 4 aws hosts through ssh using below command, but after getting into the aws host I am not able to invoke pyspark because I am getting below error.
ssh command:
ssh -i '/home/<localuser>/.ssh/cdp-trial-key.pem' centos@<aws_ip_address>
Error:
org.apache.hadoop.security.AccessControlException: Permission denied: user=centos, access=WRITE, inode="/user":hdfs:supergroup:drwxr-xr-x
My objective:
I need to learn all components in CDP, write few lines of code using python and spark, access HDFS, etc. But I am new to this setup. Could you please help with solutions?
Thank you.
Created 04-12-2022 11:17 AM
Hi @Ragavend,
Happy to hear that you are exploring CDP Private Cloud and taking on the learning of the platform in a lab environment. To answer your questions:
1. There are a few steps that are needed to access CDP Private Cloud CLI. Instructions are here: https://docs.cloudera.com/management-console/1.3.3/private-cloud-cli/topics/mc-private-cloud-cli-cli.... Note that you will need to allow external connections to your AWS EC2 instances in order to be able to issue commands from your laptop to the CDP cluster. This is also assuming you are talking about CDP CLI. If you are talking about AWS CLI (different tool entirely), then please see the many AWS tutorials available.
2. In order to run pyspark, the user who is executing the the job needs to be able to create a log directory on hdfs. So, instead of running your command as root (i.e. centos) try running it as your CDP admin user.
Hope this helps.
Regards,
Alex
Created 04-12-2022 11:17 AM
Hi @Ragavend,
Happy to hear that you are exploring CDP Private Cloud and taking on the learning of the platform in a lab environment. To answer your questions:
1. There are a few steps that are needed to access CDP Private Cloud CLI. Instructions are here: https://docs.cloudera.com/management-console/1.3.3/private-cloud-cli/topics/mc-private-cloud-cli-cli.... Note that you will need to allow external connections to your AWS EC2 instances in order to be able to issue commands from your laptop to the CDP cluster. This is also assuming you are talking about CDP CLI. If you are talking about AWS CLI (different tool entirely), then please see the many AWS tutorials available.
2. In order to run pyspark, the user who is executing the the job needs to be able to create a log directory on hdfs. So, instead of running your command as root (i.e. centos) try running it as your CDP admin user.
Hope this helps.
Regards,
Alex
Created 04-15-2022 12:34 AM