10-07-2018 11:58 PM
My main purpose is to get the appId after submitting the yarn-cluster task through java code, which is convenient for more business operations.
Add the "--conf=spark.extraListeners=Mylistener"
While SparkListener does work when I use Spark in standalone mode, it doesn't work when I run Spark on a cluster over Yarn. Is it possible for SparkListener to work when running over Yarn? If so, what steps should I do to enable that?
Here is the Mylistener class code:
public class Mylistener extends SparkListener { private static Logger logger = LoggerFactory.getLogger(EnvelopeSparkListener.class); @Override public void onApplicationStart(SparkListenerApplicationStart sparkListenerApplicationStart) { Option<String> appId = sparkListenerApplicationStart.appId(); EnvelopeSubmit.appId = appId.get(); logger.info("====================start"); } @Override public void onBlockManagerAdded(SparkListenerBlockManagerAdded blockManagerAdded) { logger.info("=====================add"); } }
Here is the Main class to submit the applicaiton:
public static void main(String[] args) { String jarpath = args[0]; String childArg = args[1]; System.out.println("jarpath:" + jarpath); System.out.println("childArg:" + childArg); System.setProperty("HADOOP_USER_NAME", "hdfs"); String[] arg = {"--verbose=true", "--class=com.cloudera.labs.envelope.EnvelopeMain", "--master=yarn", "--deploy-mode=cluster","--conf=spark.extraListeners=Mylistener","--conf","spark.eventLog.enabled=true", "--conf","spark.yarn.jars=hdfs://192.168.6.188:8020/user/hdfs/lib/*", jarpath, childArg}; SparkSubmit.main(arg); }
10-08-2018 06:52 AM
I would recommend using SparkLauncher to submit your Envelope application to the cluster. That has a more structured API for configuring the application, and when you submit it then it will return you a SparkAppHandle that has a method for retrieving the app ID.
10-08-2018 10:46 PM
10-08-2018 10:48 PM
i want to submit my envelope application to the cluster via YARN,now i use "org.apache.spark.deploy.yarn.client" to submit directly. If you have any good idea,please tell me.
tks.
public static void main(String[] s) throws Exception {
String[] args = new String[]{
"--jar", "build\\envelope-full\\target\\envelope-full-0.5.0.jar",
"--class", "com.cloudera.labs.envelope.EnvelopeMain",
"--arg", "hdfs://fj-c7-188.linewell.com:8020/user/hdfs/test.conf"
};
Configuration config = loadConfigFiles(HADOOP_SITE_FILES);
System.setProperty("HADOOP_USER_NAME", "hdfs");
System.setProperty("SPARK_YARN_MODE", "true");
System.setProperty("hdp.version", "2.6.1.0-129");
ClientArguments carg = new ClientArguments(args);
SparkConf sparkConf = new SparkConf();
sparkConf.set("spark.submit.deployMode", "cluster");
sparkConf.set("spark.driver.extraJavaOptions", "-Dhdp.version=2.6.1.0-129");
sparkConf.set("spark.executor.extraJavaOptions", "-Dhdp.version=2.6.1.0-129");
sparkConf.set("spark.yarn.jars","hdfs://192.168.6.188:8020/user/hdfs/lib/*");
sparkConf.set("spark.eventLog.enabled=","true");
Client client = new Client(carg,config,sparkConf);
System.out.println(client.submitApplication());
// new Client(carg, config, sparkConf).run();
}
10-09-2018 06:39 AM
10-09-2018 06:30 PM
10-09-2018 06:41 PM
It does! Just use the 'setMaster' and 'setDeployMode' methods like you would use '--master' and '--deploy-mode' on the command line.
Currently incubating in Cloudera Labs:
Envelope