Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here. Want to know more about what has changed? Check out the Community News blog.

Oozie Spark Action - jars upload issue

Oozie Spark Action - jars upload issue

New Contributor

We are using oozie workflow - spark action on yarn mode in CDH 5.8.0. When a job started, it will prepare a long time to upload the jar belong to '/opt/cloudera/parcels/CDH-5.8.0-1.cdh5.8.0.p0.42/lib/'. And this process will take about approximate 5 minutes.

 

Following is several lines of the output logs:

2016-08-08 19:10:30,312 INFO [main] org.apache.spark.deploy.yarn.Client: Uploading resource file:/opt/cloudera/parcels/CDH-5.8.0-1.cdh5.8.0.p0.42/lib/hadoop/hadoop-annotations.jar -> hdfs://ns/user/hdfs/.sparkStaging/application_1469502027340_0471/hadoop-annotations.jar
2016-08-08 19:10:31,921 INFO [main] org.apache.spark.deploy.yarn.Client: Uploading resource file:/opt/cloudera/parcels/CDH-5.8.0-1.cdh5.8.0.p0.42/lib/hadoop/hadoop-auth.jar -> hdfs://ns/user/hdfs/.sparkStaging/application_1469502027340_0471/hadoop-auth.jar
2016-08-08 19:10:32,911 INFO [main] org.apache.spark.deploy.yarn.Client: Uploading resource file:/opt/cloudera/parcels/CDH-5.8.0-1.cdh5.8.0.p0.42/lib/hadoop/hadoop-aws.jar -> hdfs://ns/user/hdfs/.sparkStaging/application_1469502027340_0471/hadoop-aws.jar
...
2016-08-08 19:12:14,041 INFO [main] org.apache.spark.deploy.yarn.Client: Uploading resource file:/opt/cloudera/parcels/CDH-5.8.0-1.cdh5.8.0.p0.42/lib/hadoop-hdfs/hadoop-hdfs-nfs.jar -> hdfs://ns/user/hdfs/.sparkStaging/application_1469502027340_0471/hadoop-hdfs-nfs.jar
2016-08-08 19:12:14,916 INFO [main] org.apache.spark.deploy.yarn.Client: Uploading resource file:/opt/cloudera/parcels/CDH-5.8.0-1.cdh5.8.0.p0.42/lib/hadoop-hdfs/hadoop-hdfs-tests.jar -> hdfs://ns/user/hdfs/.sparkStaging/application_1469502027340_0471/hadoop-hdfs-tests.jar
2016-08-08 19:12:17,184 INFO [main] org.apache.spark.deploy.yarn.Client: Uploading resource file:/opt/cloudera/parcels/CDH-5.8.0-1.cdh5.8.0.p0.42/lib/hadoop-hdfs/hadoop-hdfs.jar -> hdfs://ns/user/hdfs/.sparkStaging/application_1469502027340_0471/hadoop-hdfs.jar
2016-08-08 19:12:20,331 INFO [main] org.apache.spark.deploy.yarn.Client: Uploading resource file:/opt/cloudera/parcels/CDH-5.8.0-1.cdh5.8.0.p0.42/lib/hadoop-hdfs/hadoop-hdfs-nfs-2.6.0-cdh5.8.0.jar -> hdfs://ns/user/hdfs/.sparkStaging/application_1469502027340_0471/hadoop-hdfs-nfs-2.6.0-cdh5.8.0.jar
...
2016-08-08 19:12:40,483 INFO [main] org.apache.spark.deploy.yarn.Client: Uploading resource file:/opt/cloudera/parcels/CDH-5.8.0-1.cdh5.8.0.p0.42/lib/hadoop-yarn/hadoop-yarn-api.jar -> hdfs://ns/user/hdfs/.sparkStaging/application_1469502027340_0471/hadoop-yarn-api.jar
2016-08-08 19:12:41,400 INFO [main] org.apache.spark.deploy.yarn.Client: Uploading resource file:/opt/cloudera/parcels/CDH-5.8.0-1.cdh5.8.0.p0.42/lib/hadoop-yarn/hadoop-yarn-applications-distributedshell.jar -> hdfs://ns/user/hdfs/.sparkStaging/application_1469502027340_0471/hadoop-yarn-applications-distributedshell.jar
2016-08-08 19:12:42,386 INFO [main] org.apache.spark.deploy.yarn.Client: Uploading resource file:/opt/cloudera/parcels/CDH-5.8.0-1.cdh5.8.0.p0.42/lib/hadoop-yarn/hadoop-yarn-applications-unmanaged-am-launcher.jar -> hdfs://ns/user/hdfs/.sparkStaging/application_1469502027340_0471/hadoop-yarn-applications-unmanaged-am-launcher.jar
2016-08-08 19:12:43,615 INFO [main] org.apache.spark.deploy.yarn.Client: Uploading resource file:/opt/cloudera/parcels/CDH-5.8.0-1.cdh5.8.0.p0.42/lib/hadoop-yarn/hadoop-yarn-client.jar -> hdfs://ns/user/hdfs/.sparkStaging/application_1469502027340_0471/hadoop-yarn-client.jar
2016-08-08 19:12:44,632 INFO [main] org.apache.spark.deploy.yarn.Client: Uploading resource file:/opt/cloudera/parcels/CDH-5.8.0-1.cdh5.8.0.p0.42/lib/hadoop-yarn/hadoop-yarn-common.jar -> hdfs://ns/user/hdfs/.sparkStaging/application_1469502027340_0471/hadoop-yarn-common.jar
...
2016-08-08 19:13:50,199 INFO [main] org.apache.spark.deploy.yarn.Client: Uploading resource file:/opt/cloudera/parcels/CDH-5.8.0-1.cdh5.8.0.p0.42/lib/hadoop-mapreduce/apacheds-i18n-2.0.0-M15.jar -> hdfs://ns/user/hdfs/.sparkStaging/application_1469502027340_0471/apacheds-i18n-2.0.0-M15.jar
2016-08-08 19:13:51,934 INFO [main] org.apache.spark.deploy.yarn.Client: Uploading resource file:/opt/cloudera/parcels/CDH-5.8.0-1.cdh5.8.0.p0.42/lib/hadoop-mapreduce/apacheds-kerberos-codec-2.0.0-M15.jar -> hdfs://ns/user/hdfs/.sparkStaging/application_1469502027340_0471/apacheds-kerberos-codec-2.0.0-M15.jar
2016-08-08 19:13:53,658 INFO [main] org.apache.spark.deploy.yarn.Client: Uploading resource file:/opt/cloudera/parcels/CDH-5.8.0-1.cdh5.8.0.p0.42/lib/hadoop-mapreduce/api-asn1-api-1.0.0-M20.jar -> hdfs://ns/user/hdfs/.sparkStaging/application_1469502027340_0471/api-asn1-api-1.0.0-M20.jar
2016-08-08 19:13:55,297 INFO [main] org.apache.spark.deploy.yarn.Client: Uploading resource file:/opt/cloudera/parcels/CDH-5.8.0-1.cdh5.8.0.p0.42/lib/hadoop-mapreduce/api-util-1.0.0-M20.jar -> hdfs://ns/user/hdfs/.sparkStaging/application_1469502027340_0471/api-util-1.0.0-M20.jar
2016-08-08 19:13:56,768 INFO [main] org.apache.spark.deploy.yarn.Client: Uploading resource file:/opt/cloudera/parcels/CDH-5.8.0-1.cdh5.8.0.p0.42/lib/hadoop-mapreduce/commons-beanutils-1.7.0.jar -> hdfs://ns/user/hdfs/.sparkStaging/application_1469502027340_0471/commons-beanutils-1.7.0.jar
...

 

Can we skip the process of upload hadoop jars for speed up the workflow.

1 REPLY 1

Re: Oozie Spark Action - jars upload issue

New Contributor

I'd also be interested in help with this issue.

 

We are using CDH 5.9.1 and looking at the logs, oozie uploads ~200 jars every time it runs a workflow (not mentioning the ones in Oozie's ShareLib, which are not being uploaded). The files come from:

  • /opt/cloudera/parcels/CDH-5.9.1-1.cdh5.9.1.p0.4/lib/hadoop/
  • /opt/cloudera/parcels/CDH-5.9.1-1.cdh5.9.1.p0.4/lib/hadoop/lib/
  • /opt/cloudera/parcels/CDH-5.9.1-1.cdh5.9.1.p0.4/lib/hadoop-hdfs/
  • /opt/cloudera/parcels/CDH-5.9.1-1.cdh5.9.1.p0.4/lib/hadoop-hdfs/lib/
  • /opt/cloudera/parcels/CDH-5.9.1-1.cdh5.9.1.p0.4/lib/hadoop-yarn/
  • /opt/cloudera/parcels/CDH-5.9.1-1.cdh5.9.1.p0.4/lib/hadoop-yarn/lib/
  • /opt/cloudera/parcels/CDH-5.9.1-1.cdh5.9.1.p0.4/lib/hadoop-mapreduce/
  • /opt/cloudera/parcels/CDH-5.9.1-1.cdh5.9.1.p0.4/lib/hadoop-mapreduce/lib/

 

Of those 199 jars there's at least 88 that will never be used (we don't use MRv1 at all).

 

I *think* that this is coming from YARN's configuration; yarn.application.classpath and mapreduce.application.classpath would account for all the jars being uploaded.

 

I've tried uploading these jar files to a custom place in HDFS and changing these settings, without much success (maybe because of the colon in the hdfs url?).

 

Any help with this is appreciated.