Member since
07-13-2018
28
Posts
1
Kudos Received
0
Solutions
01-29-2019
09:13 AM
Note: The below process is easier if the node is a gateway node. The correct Spark version and the directories will be readily available for mounting to the docker container. The quick and dirty way is to have an installation of Spark which matches your cluster's major version installed or mounted in the docker container. As well, you will need to mount the yarn and hadoop configuration directories in the docker container. Mounting these will prevent you from needing to set a ton of config on submission. Eg: "spark.hadoop.yarn.resourcemanager.hostname","XXX" Often these both can be set to the same value: /opt/cloudera/parcels/SPARK2/lib/spark2/conf/yarn-conf. The SPARK_CONF_DIR, HADOOP_CONF_DIR and YARN_CONF_DIR environment variables need to reference be set if using spark-submit. If using SparkLauncher, they can be set like so: val env = Map( "HADOOP_CONF_DIR" -> "/example/hadoop/path", "YARN_CONF_DIR" -> "/example/yarn/path" ) val launcher = new SparkLauncher(env.asJava).setSparkHome("/path/to/mounted/spark") If submitting to a kerberized cluster, the easiest way is to mount a keytab file and the /etc/krb5.conf file in the docker container. Set the principal and keytab using spark.yarn.principal and spark.yarn.keytab, respectively. For ports, 8032 of the Spark Master's (Yarn ResourceManager External) definitely needs to be open to traffic from the docker node. I am not sure if this is the complete list of ports - could another user verify?
... View more
01-24-2019
08:43 PM
@bgooley Hi, I have an requirement for installing CDH5.11 to be installed in my cluster. So i have followed your above point and installed the latest Clouderamanager and post that the cluster created with CDH5.16.1(as default mentioned in the cloudera docs). And after that I have uninstalled the CD5.16.1 parcel and added the parcel repo for CDH5.11 and installed the same. Now my cluster is active with the CDH5.11. I am only bothered about my CDH5.11. Could you please confirm what i did is correct or do i need to follow any other instruction to install the CDH5.11. Sorry to add in the old thread. Thanks in Advance!
... View more
11-26-2018
08:50 AM
@peter_ableda Thanks Peter for your time and have a grt day!
... View more
11-17-2018
09:53 AM
@peter_ableda Sorry to ask you. Actually I have installed cdsw 1.4 on my cdsw machine and when I am trying to start the sparksession/running any hdfs commands then I am getting the error as unknowhostException with the clouderamaster hostname. I am very new to cloudera so not sure which set up i am missing as i followed the set up related to pyspark(by importing the template while creating the project and starting the pyhton 2 env to run the pyspark job). It would be great help if you can guide me something which I am missing from my set up. Thanks in Advance!!!
... View more
10-14-2018
11:17 AM
Thanks Boris for the below information. Will try the same.
... View more
07-17-2018
09:11 AM
@bgooley Hi, Thanks a ton for all your response to my queries. I just deployed my application jar( post upgrading the Spark 1.6 to 2.2 using the CSD) and can see its working fine though i tested few basic scenarios. Let me create as a service and hope it will work fine as well.
... View more