Reply
New Contributor
Posts: 1
Registered: ‎08-07-2016

Oozie Spark Action - jars upload issue

[ Edited ]

We are using oozie workflow - spark action on yarn mode in CDH 5.8.0. When a job started, it will prepare a long time to upload the jar belong to '/opt/cloudera/parcels/CDH-5.8.0-1.cdh5.8.0.p0.42/lib/'. And this process will take about approximate 5 minutes.

 

Following is several lines of the output logs:

2016-08-08 19:10:30,312 INFO [main] org.apache.spark.deploy.yarn.Client: Uploading resource file:/opt/cloudera/parcels/CDH-5.8.0-1.cdh5.8.0.p0.42/lib/hadoop/hadoop-annotations.jar -> hdfs://ns/user/hdfs/.sparkStaging/application_1469502027340_0471/hadoop-annotations.jar
2016-08-08 19:10:31,921 INFO [main] org.apache.spark.deploy.yarn.Client: Uploading resource file:/opt/cloudera/parcels/CDH-5.8.0-1.cdh5.8.0.p0.42/lib/hadoop/hadoop-auth.jar -> hdfs://ns/user/hdfs/.sparkStaging/application_1469502027340_0471/hadoop-auth.jar
2016-08-08 19:10:32,911 INFO [main] org.apache.spark.deploy.yarn.Client: Uploading resource file:/opt/cloudera/parcels/CDH-5.8.0-1.cdh5.8.0.p0.42/lib/hadoop/hadoop-aws.jar -> hdfs://ns/user/hdfs/.sparkStaging/application_1469502027340_0471/hadoop-aws.jar
...
2016-08-08 19:12:14,041 INFO [main] org.apache.spark.deploy.yarn.Client: Uploading resource file:/opt/cloudera/parcels/CDH-5.8.0-1.cdh5.8.0.p0.42/lib/hadoop-hdfs/hadoop-hdfs-nfs.jar -> hdfs://ns/user/hdfs/.sparkStaging/application_1469502027340_0471/hadoop-hdfs-nfs.jar
2016-08-08 19:12:14,916 INFO [main] org.apache.spark.deploy.yarn.Client: Uploading resource file:/opt/cloudera/parcels/CDH-5.8.0-1.cdh5.8.0.p0.42/lib/hadoop-hdfs/hadoop-hdfs-tests.jar -> hdfs://ns/user/hdfs/.sparkStaging/application_1469502027340_0471/hadoop-hdfs-tests.jar
2016-08-08 19:12:17,184 INFO [main] org.apache.spark.deploy.yarn.Client: Uploading resource file:/opt/cloudera/parcels/CDH-5.8.0-1.cdh5.8.0.p0.42/lib/hadoop-hdfs/hadoop-hdfs.jar -> hdfs://ns/user/hdfs/.sparkStaging/application_1469502027340_0471/hadoop-hdfs.jar
2016-08-08 19:12:20,331 INFO [main] org.apache.spark.deploy.yarn.Client: Uploading resource file:/opt/cloudera/parcels/CDH-5.8.0-1.cdh5.8.0.p0.42/lib/hadoop-hdfs/hadoop-hdfs-nfs-2.6.0-cdh5.8.0.jar -> hdfs://ns/user/hdfs/.sparkStaging/application_1469502027340_0471/hadoop-hdfs-nfs-2.6.0-cdh5.8.0.jar
...
2016-08-08 19:12:40,483 INFO [main] org.apache.spark.deploy.yarn.Client: Uploading resource file:/opt/cloudera/parcels/CDH-5.8.0-1.cdh5.8.0.p0.42/lib/hadoop-yarn/hadoop-yarn-api.jar -> hdfs://ns/user/hdfs/.sparkStaging/application_1469502027340_0471/hadoop-yarn-api.jar
2016-08-08 19:12:41,400 INFO [main] org.apache.spark.deploy.yarn.Client: Uploading resource file:/opt/cloudera/parcels/CDH-5.8.0-1.cdh5.8.0.p0.42/lib/hadoop-yarn/hadoop-yarn-applications-distributedshell.jar -> hdfs://ns/user/hdfs/.sparkStaging/application_1469502027340_0471/hadoop-yarn-applications-distributedshell.jar
2016-08-08 19:12:42,386 INFO [main] org.apache.spark.deploy.yarn.Client: Uploading resource file:/opt/cloudera/parcels/CDH-5.8.0-1.cdh5.8.0.p0.42/lib/hadoop-yarn/hadoop-yarn-applications-unmanaged-am-launcher.jar -> hdfs://ns/user/hdfs/.sparkStaging/application_1469502027340_0471/hadoop-yarn-applications-unmanaged-am-launcher.jar
2016-08-08 19:12:43,615 INFO [main] org.apache.spark.deploy.yarn.Client: Uploading resource file:/opt/cloudera/parcels/CDH-5.8.0-1.cdh5.8.0.p0.42/lib/hadoop-yarn/hadoop-yarn-client.jar -> hdfs://ns/user/hdfs/.sparkStaging/application_1469502027340_0471/hadoop-yarn-client.jar
2016-08-08 19:12:44,632 INFO [main] org.apache.spark.deploy.yarn.Client: Uploading resource file:/opt/cloudera/parcels/CDH-5.8.0-1.cdh5.8.0.p0.42/lib/hadoop-yarn/hadoop-yarn-common.jar -> hdfs://ns/user/hdfs/.sparkStaging/application_1469502027340_0471/hadoop-yarn-common.jar
...
2016-08-08 19:13:50,199 INFO [main] org.apache.spark.deploy.yarn.Client: Uploading resource file:/opt/cloudera/parcels/CDH-5.8.0-1.cdh5.8.0.p0.42/lib/hadoop-mapreduce/apacheds-i18n-2.0.0-M15.jar -> hdfs://ns/user/hdfs/.sparkStaging/application_1469502027340_0471/apacheds-i18n-2.0.0-M15.jar
2016-08-08 19:13:51,934 INFO [main] org.apache.spark.deploy.yarn.Client: Uploading resource file:/opt/cloudera/parcels/CDH-5.8.0-1.cdh5.8.0.p0.42/lib/hadoop-mapreduce/apacheds-kerberos-codec-2.0.0-M15.jar -> hdfs://ns/user/hdfs/.sparkStaging/application_1469502027340_0471/apacheds-kerberos-codec-2.0.0-M15.jar
2016-08-08 19:13:53,658 INFO [main] org.apache.spark.deploy.yarn.Client: Uploading resource file:/opt/cloudera/parcels/CDH-5.8.0-1.cdh5.8.0.p0.42/lib/hadoop-mapreduce/api-asn1-api-1.0.0-M20.jar -> hdfs://ns/user/hdfs/.sparkStaging/application_1469502027340_0471/api-asn1-api-1.0.0-M20.jar
2016-08-08 19:13:55,297 INFO [main] org.apache.spark.deploy.yarn.Client: Uploading resource file:/opt/cloudera/parcels/CDH-5.8.0-1.cdh5.8.0.p0.42/lib/hadoop-mapreduce/api-util-1.0.0-M20.jar -> hdfs://ns/user/hdfs/.sparkStaging/application_1469502027340_0471/api-util-1.0.0-M20.jar
2016-08-08 19:13:56,768 INFO [main] org.apache.spark.deploy.yarn.Client: Uploading resource file:/opt/cloudera/parcels/CDH-5.8.0-1.cdh5.8.0.p0.42/lib/hadoop-mapreduce/commons-beanutils-1.7.0.jar -> hdfs://ns/user/hdfs/.sparkStaging/application_1469502027340_0471/commons-beanutils-1.7.0.jar
...

 

Can we skip the process of upload hadoop jars for speed up the workflow.

New Contributor
Posts: 2
Registered: ‎05-09-2017

Re: Oozie Spark Action - jars upload issue

I'd also be interested in help with this issue.

 

We are using CDH 5.9.1 and looking at the logs, oozie uploads ~200 jars every time it runs a workflow (not mentioning the ones in Oozie's ShareLib, which are not being uploaded). The files come from:

  • /opt/cloudera/parcels/CDH-5.9.1-1.cdh5.9.1.p0.4/lib/hadoop/
  • /opt/cloudera/parcels/CDH-5.9.1-1.cdh5.9.1.p0.4/lib/hadoop/lib/
  • /opt/cloudera/parcels/CDH-5.9.1-1.cdh5.9.1.p0.4/lib/hadoop-hdfs/
  • /opt/cloudera/parcels/CDH-5.9.1-1.cdh5.9.1.p0.4/lib/hadoop-hdfs/lib/
  • /opt/cloudera/parcels/CDH-5.9.1-1.cdh5.9.1.p0.4/lib/hadoop-yarn/
  • /opt/cloudera/parcels/CDH-5.9.1-1.cdh5.9.1.p0.4/lib/hadoop-yarn/lib/
  • /opt/cloudera/parcels/CDH-5.9.1-1.cdh5.9.1.p0.4/lib/hadoop-mapreduce/
  • /opt/cloudera/parcels/CDH-5.9.1-1.cdh5.9.1.p0.4/lib/hadoop-mapreduce/lib/

 

Of those 199 jars there's at least 88 that will never be used (we don't use MRv1 at all).

 

I *think* that this is coming from YARN's configuration; yarn.application.classpath and mapreduce.application.classpath would account for all the jars being uploaded.

 

I've tried uploading these jar files to a custom place in HDFS and changing these settings, without much success (maybe because of the colon in the hdfs url?).

 

Any help with this is appreciated.

Announcements