Welcome to the Cloudera Community

Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Who agreed with this topic

Oozie Spark Action - jars upload issue

avatar

We are using oozie workflow - spark action on yarn mode in CDH 5.8.0. When a job started, it will prepare a long time to upload the jar belong to '/opt/cloudera/parcels/CDH-5.8.0-1.cdh5.8.0.p0.42/lib/'. And this process will take about approximate 5 minutes.

 

Following is several lines of the output logs:

2016-08-08 19:10:30,312 INFO [main] org.apache.spark.deploy.yarn.Client: Uploading resource file:/opt/cloudera/parcels/CDH-5.8.0-1.cdh5.8.0.p0.42/lib/hadoop/hadoop-annotations.jar -> hdfs://ns/user/hdfs/.sparkStaging/application_1469502027340_0471/hadoop-annotations.jar
2016-08-08 19:10:31,921 INFO [main] org.apache.spark.deploy.yarn.Client: Uploading resource file:/opt/cloudera/parcels/CDH-5.8.0-1.cdh5.8.0.p0.42/lib/hadoop/hadoop-auth.jar -> hdfs://ns/user/hdfs/.sparkStaging/application_1469502027340_0471/hadoop-auth.jar
2016-08-08 19:10:32,911 INFO [main] org.apache.spark.deploy.yarn.Client: Uploading resource file:/opt/cloudera/parcels/CDH-5.8.0-1.cdh5.8.0.p0.42/lib/hadoop/hadoop-aws.jar -> hdfs://ns/user/hdfs/.sparkStaging/application_1469502027340_0471/hadoop-aws.jar
...
2016-08-08 19:12:14,041 INFO [main] org.apache.spark.deploy.yarn.Client: Uploading resource file:/opt/cloudera/parcels/CDH-5.8.0-1.cdh5.8.0.p0.42/lib/hadoop-hdfs/hadoop-hdfs-nfs.jar -> hdfs://ns/user/hdfs/.sparkStaging/application_1469502027340_0471/hadoop-hdfs-nfs.jar
2016-08-08 19:12:14,916 INFO [main] org.apache.spark.deploy.yarn.Client: Uploading resource file:/opt/cloudera/parcels/CDH-5.8.0-1.cdh5.8.0.p0.42/lib/hadoop-hdfs/hadoop-hdfs-tests.jar -> hdfs://ns/user/hdfs/.sparkStaging/application_1469502027340_0471/hadoop-hdfs-tests.jar
2016-08-08 19:12:17,184 INFO [main] org.apache.spark.deploy.yarn.Client: Uploading resource file:/opt/cloudera/parcels/CDH-5.8.0-1.cdh5.8.0.p0.42/lib/hadoop-hdfs/hadoop-hdfs.jar -> hdfs://ns/user/hdfs/.sparkStaging/application_1469502027340_0471/hadoop-hdfs.jar
2016-08-08 19:12:20,331 INFO [main] org.apache.spark.deploy.yarn.Client: Uploading resource file:/opt/cloudera/parcels/CDH-5.8.0-1.cdh5.8.0.p0.42/lib/hadoop-hdfs/hadoop-hdfs-nfs-2.6.0-cdh5.8.0.jar -> hdfs://ns/user/hdfs/.sparkStaging/application_1469502027340_0471/hadoop-hdfs-nfs-2.6.0-cdh5.8.0.jar
...
2016-08-08 19:12:40,483 INFO [main] org.apache.spark.deploy.yarn.Client: Uploading resource file:/opt/cloudera/parcels/CDH-5.8.0-1.cdh5.8.0.p0.42/lib/hadoop-yarn/hadoop-yarn-api.jar -> hdfs://ns/user/hdfs/.sparkStaging/application_1469502027340_0471/hadoop-yarn-api.jar
2016-08-08 19:12:41,400 INFO [main] org.apache.spark.deploy.yarn.Client: Uploading resource file:/opt/cloudera/parcels/CDH-5.8.0-1.cdh5.8.0.p0.42/lib/hadoop-yarn/hadoop-yarn-applications-distributedshell.jar -> hdfs://ns/user/hdfs/.sparkStaging/application_1469502027340_0471/hadoop-yarn-applications-distributedshell.jar
2016-08-08 19:12:42,386 INFO [main] org.apache.spark.deploy.yarn.Client: Uploading resource file:/opt/cloudera/parcels/CDH-5.8.0-1.cdh5.8.0.p0.42/lib/hadoop-yarn/hadoop-yarn-applications-unmanaged-am-launcher.jar -> hdfs://ns/user/hdfs/.sparkStaging/application_1469502027340_0471/hadoop-yarn-applications-unmanaged-am-launcher.jar
2016-08-08 19:12:43,615 INFO [main] org.apache.spark.deploy.yarn.Client: Uploading resource file:/opt/cloudera/parcels/CDH-5.8.0-1.cdh5.8.0.p0.42/lib/hadoop-yarn/hadoop-yarn-client.jar -> hdfs://ns/user/hdfs/.sparkStaging/application_1469502027340_0471/hadoop-yarn-client.jar
2016-08-08 19:12:44,632 INFO [main] org.apache.spark.deploy.yarn.Client: Uploading resource file:/opt/cloudera/parcels/CDH-5.8.0-1.cdh5.8.0.p0.42/lib/hadoop-yarn/hadoop-yarn-common.jar -> hdfs://ns/user/hdfs/.sparkStaging/application_1469502027340_0471/hadoop-yarn-common.jar
...
2016-08-08 19:13:50,199 INFO [main] org.apache.spark.deploy.yarn.Client: Uploading resource file:/opt/cloudera/parcels/CDH-5.8.0-1.cdh5.8.0.p0.42/lib/hadoop-mapreduce/apacheds-i18n-2.0.0-M15.jar -> hdfs://ns/user/hdfs/.sparkStaging/application_1469502027340_0471/apacheds-i18n-2.0.0-M15.jar
2016-08-08 19:13:51,934 INFO [main] org.apache.spark.deploy.yarn.Client: Uploading resource file:/opt/cloudera/parcels/CDH-5.8.0-1.cdh5.8.0.p0.42/lib/hadoop-mapreduce/apacheds-kerberos-codec-2.0.0-M15.jar -> hdfs://ns/user/hdfs/.sparkStaging/application_1469502027340_0471/apacheds-kerberos-codec-2.0.0-M15.jar
2016-08-08 19:13:53,658 INFO [main] org.apache.spark.deploy.yarn.Client: Uploading resource file:/opt/cloudera/parcels/CDH-5.8.0-1.cdh5.8.0.p0.42/lib/hadoop-mapreduce/api-asn1-api-1.0.0-M20.jar -> hdfs://ns/user/hdfs/.sparkStaging/application_1469502027340_0471/api-asn1-api-1.0.0-M20.jar
2016-08-08 19:13:55,297 INFO [main] org.apache.spark.deploy.yarn.Client: Uploading resource file:/opt/cloudera/parcels/CDH-5.8.0-1.cdh5.8.0.p0.42/lib/hadoop-mapreduce/api-util-1.0.0-M20.jar -> hdfs://ns/user/hdfs/.sparkStaging/application_1469502027340_0471/api-util-1.0.0-M20.jar
2016-08-08 19:13:56,768 INFO [main] org.apache.spark.deploy.yarn.Client: Uploading resource file:/opt/cloudera/parcels/CDH-5.8.0-1.cdh5.8.0.p0.42/lib/hadoop-mapreduce/commons-beanutils-1.7.0.jar -> hdfs://ns/user/hdfs/.sparkStaging/application_1469502027340_0471/commons-beanutils-1.7.0.jar
...

 

Can we skip the process of upload hadoop jars for speed up the workflow.

Who agreed with this topic