Created on 07-17-2020 01:08 PM - last edited on 07-19-2020 11:24 PM by VidyaSargur
Guys,
Looking at the CCA175 syllabus I’m having difficulty finding info on how to supply command-line options to change an applications configuration.
I thought the whole raison d'etre of Cloudera is that they do all the configuring while analysts do analyst things and not have to do any configuring?!
Any suggestions of where the answer to this issue is would be much appreciated.
Please no mods or bots suggesting to read lame links that don’t help.
Danke guys!
Created 07-17-2020 04:04 PM
I think what is being said and misunderstood her is the runtime variables like in hive, where you can personalize your environment by executing a file at startup.
See this example and many more online examples
There are total of three available for holding variables.
If you do not provide namespace as mentioned below, variable var will be stored in hiveconf namespace.
set var="default_namespace";
So, to access this you need to specify hiveconf namespace
select ${hiveconf:var};
Hope that helps
Created 07-17-2020 11:05 PM
Shelton,
Thanks for the response. I am unable to understand what it is that you have written.
My question relates to CCA175 exam syllabus where they say that all candidates should be familar with all aspects of generating a result, not just writing code.
One example they give is that one should kow how to increase available memory. I don't know how to increase available memory. Do you?
They give no other examples so I don't know what to search for!
FYI, I will not be going anywhere near the Hive CLI. In the exam I will be working exclusively from the spark-shell & using Spark SQL via a SparkSession
Thanks for your help & effort, though unfortunately it hasn't been of use to me at this time.
Could you perhaps give another example?
Danke!
Created 07-18-2020 01:16 AM
Sorry for misunderstanding you but you were neither specific nor clear in your earlier posting as being specific to Spark-shell etc, Increasing memory for spark interactively is done by using the --driver-memory option to set the memory for the driver process.
Here are simple examples of executions of standalone [1 node] and Cluster executions these are version-specific
Run spark-shell on spark installed on standalone mode spark version 1.2.0
./spark-shell --driver-memory 2g
Run spark-shell on spark installed on the cluster
./bin/spark-shell --executor-memory 4g
Spark 1.2.0 you can set memory and cores by giving following arguments to spark-shell
./spark-shell --driver-memory 10G --executor-memory 15G --executor-cores 8
Using Spark version 2.4.5 to run the application locally on 8 cores
./bin/spark-submit \
--class org.apache.spark.examples.SparkPi \
--master local[8] \
/path/to/examples.jar \
100
Run-on a Spark standalone cluster in client deploy mode
./bin/spark-submit \
--class org.apache.spark.examples.SparkPi \
--master spark://207.184.161.138:7077 \
--executor-memory 20G \
--total-executor-cores 100 \
/path/to/examples.jar \
1000
Run-on a Spark standalone cluster in cluster deploy mode with supervise
./bin/spark-submit \
--class org.apache.spark.examples.SparkPi \
--master spark://207.184.161.138:7077 \
--deploy-mode cluster \
--supervise \
--executor-memory 20G \
--total-executor-cores 100 \
/path/to/examples.jar \
1000
Run-on a YARN cluster you will need to export HADOOP_CONF_DIR=XXX then
./bin/spark-submit \
--class org.apache.spark.examples.SparkPi \
--master yarn \
--deploy-mode cluster \ # can be client for client mode
--executor-memory 20G \
--num-executors 50 \
/path/to/examples.jar \
1000
So in the above examples, you have to adjust the below parameters
--executor-memory
--total-executor-cores
Get help commands to spark shell
spark-shell --help
Hope that answers your question on how to interactive increase available memory to be able to run rings around spark there is no better source than spark-docs.
Happy hadooping
Created 07-18-2020 01:43 AM
Thanks for your response Shelton.
I would like to gently remind you & the admins of something. When you are a beginner, the recommended documentation is not always as helpful as it seems to those who already have an understanding of what they’re doing!
I would appreciate input from anyone else that is looking to sit or has sat the CCA175 exam.
Aside from increasing the available memory what else would you/ have you considered to optimise performance using spark-shell or pySpark.
I’m curious as to why this not made clear by Cloudera. For the exams, we have access to the same platform CDH 6.1. Surely one setting should to it.
For example, use maximum memory/ partition by column name etc
Danke Guys!
Created on 07-18-2020 02:34 AM - edited 07-18-2020 03:20 AM
Sorry for your frustration neither I nor the admin of this platform is responsible for the contents of the CCA175 exams, here we try to help maybe you will need to address your case to certification@cloudera.com
I have found a Hadoop Developer Ashwin Rangarajan who has shared some good content below
https://ashwin.cloud/blog/cloudera-cca175-spark-developer/
Hope that helps
Created 07-18-2020 02:59 AM
Thanks for your efforts Shelton.