Support Questions

VidyaSargur · ‎07-17-2020

Guys,

Looking at the CCA175 syllabus I’m having difficulty finding info on how to supply command-line options to change an applications configuration.

I thought the whole raison d'etre of Cloudera is that they do all the configuring while analysts do analyst things and not have to do any configuring?!

Any suggestions of where the answer to this issue is would be much appreciated.

Please no mods or bots suggesting to read lame links that don’t help.

Danke guys!

Shelton · ‎07-17-2020

@borisgersh

I think what is being said and misunderstood her is the runtime variables like in hive, where you can personalize your environment by executing a file at startup.

See this example and many more online examples

There are total of three available for holding variables.

hiveconf - hive started with this, all the hive configuration is stored as part of this conf. Initially, variable substitution was not part of hive and when it got introduced, all the user-defined variables were stored as part of this as well. Which is definitely not a good idea. So two more namespaces were created.
hivevar: To store user variables
system: To store system variables.

If you do not provide namespace as mentioned below, variable var will be stored in hiveconf namespace.

set var="default_namespace";

So, to access this you need to specify hiveconf namespace

select ${hiveconf:var};

Hope that helps

borisgersh · ‎07-17-2020

Shelton,

Thanks for the response. I am unable to understand what it is that you have written.

My question relates to CCA175 exam syllabus where they say that all candidates should be familar with all aspects of generating a result, not just writing code.

One example they give is that one should kow how to increase available memory. I don't know how to increase available memory. Do you?

They give no other examples so I don't know what to search for!

FYI, I will not be going anywhere near the Hive CLI. In the exam I will be working exclusively from the spark-shell & using Spark SQL via a SparkSession

Thanks for your help & effort, though unfortunately it hasn't been of use to me at this time.

Could you perhaps give another example?

Danke!

Shelton · ‎07-18-2020

@borisgersh

Sorry for misunderstanding you but you were neither specific nor clear in your earlier posting as being specific to Spark-shell etc, Increasing memory for spark interactively is done by using the --driver-memory option to set the memory for the driver process.

Here are simple examples of executions of standalone [1 node] and Cluster executions these are version-specific

Run spark-shell on spark installed on standalone mode spark version 1.2.0

./spark-shell --driver-memory 2g

Run spark-shell on spark installed on the cluster

./bin/spark-shell --executor-memory 4g

Spark 1.2.0 you can set memory and cores by giving following arguments to spark-shell

./spark-shell --driver-memory 10G --executor-memory 15G --executor-cores 8

Using Spark version 2.4.5 to run the application locally on 8 cores

./bin/spark-submit \
--class org.apache.spark.examples.SparkPi \
--master local[8] \
/path/to/examples.jar \
100

Run-on a Spark standalone cluster in client deploy mode

./bin/spark-submit \
--class org.apache.spark.examples.SparkPi \
--master spark://207.184.161.138:7077 \
--executor-memory 20G \
--total-executor-cores 100 \
/path/to/examples.jar \
1000

Run-on a Spark standalone cluster in cluster deploy mode with supervise

./bin/spark-submit \
--class org.apache.spark.examples.SparkPi \
--master spark://207.184.161.138:7077 \
--deploy-mode cluster \
--supervise \
--executor-memory 20G \
--total-executor-cores 100 \
/path/to/examples.jar \
1000

Run-on a YARN cluster you will need to export HADOOP_CONF_DIR=XXX then

./bin/spark-submit \
--class org.apache.spark.examples.SparkPi \
--master yarn \
--deploy-mode cluster \ # can be client for client mode
--executor-memory 20G \
--num-executors 50 \
/path/to/examples.jar \
1000

So in the above examples, you have to adjust the below parameters

--executor-memory 
--total-executor-cores

Get help commands to spark shell

spark-shell --help

Hope that answers your question on how to interactive increase available memory to be able to run rings around spark there is no better source than spark-docs.

Happy hadooping

borisgersh · ‎07-18-2020

Thanks for your response Shelton.

I would like to gently remind you & the admins of something. When you are a beginner, the recommended documentation is not always as helpful as it seems to those who already have an understanding of what they’re doing!

I would appreciate input from anyone else that is looking to sit or has sat the CCA175 exam.
Aside from increasing the available memory what else would you/ have you considered to optimise performance using spark-shell or pySpark.

I’m curious as to why this not made clear by Cloudera. For the exams, we have access to the same platform CDH 6.1. Surely one setting should to it.

For example, use maximum memory/ partition by column name etc

Danke Guys!

Shelton · ‎07-18-2020

@borisgersh

Sorry for your frustration neither I nor the admin of this platform is responsible for the contents of the CCA175 exams, here we try to help maybe you will need to address your case to certification@cloudera.com

I have found a Hadoop Developer Ashwin Rangarajan who has shared some good content below

https://ashwin.cloud/blog/cloudera-cca175-spark-developer/

Hope that helps

borisgersh · ‎07-18-2020

Thanks for your efforts Shelton.

Cloudera Community

Support Questions

Command-line options to change an applications configuration.