Member since
03-21-2017
6
Posts
3
Kudos Received
1
Solution
My Accepted Solutions
Title | Views | Posted |
---|---|---|
5285 | 03-21-2017 06:00 PM |
03-23-2017
04:44 PM
awesome @Ken Jiiii hive-site.xml should be available across the cluster in /etc/spark/conf ( where /usr/hdp/current/spark-client/conf will be symlink to) and spark client need to be installed across the cluster worker nodes for your yarn-cluster mode to run as your spark driver can run on any worker node and should be having client installed with spark/conf. If you are using Ambari it will taking care of hive-site.xml available in /spark-client/conf/
... View more
03-22-2017
06:31 AM
2 Kudos
local[*] new SparkConf() .setMaster("local[2]")
This is specific to run the job in local mode This is specifically used to test the code in small amount of data in local environment It Does not provide the advantages of distributed environment * is the number of cpu cores to be allocated to perform the local operation It helps in debugging the code by applying breakpoints while running from Eclipse or IntelliJ
yarn-client --master yarn --deploy-mode client
Yarn client mode: your driver program is running on the yarn client where you type the command to submit the spark application (may not be a machine in the yarn cluster). In this mode, although the drive program is running on the client machine, the tasks are executed on the executors in the node managers of the YARN cluster yarn-cluster --master yarn --deploy-mode cluster
This is the most advisable pattern for executing/submitting your spark jobs in production Yarn cluster mode: Your driver program is running on the cluster master machine where you type the command to submit the spark application
... View more