I'd also be interested in help with this issue. We are using CDH 5.9.1 and looking at the logs, oozie uploads ~200 jars every time it runs a workflow (not mentioning the ones in Oozie's ShareLib, which are not being uploaded). The files come from: /opt/cloudera/parcels/CDH-5.9.1-1.cdh5.9.1.p0.4/lib/hadoop/ /opt/cloudera/parcels/CDH-5.9.1-1.cdh5.9.1.p0.4/lib/hadoop/lib/ /opt/cloudera/parcels/CDH-5.9.1-1.cdh5.9.1.p0.4/lib/hadoop-hdfs/ /opt/cloudera/parcels/CDH-5.9.1-1.cdh5.9.1.p0.4/lib/hadoop-hdfs/lib/ /opt/cloudera/parcels/CDH-5.9.1-1.cdh5.9.1.p0.4/lib/hadoop-yarn/ /opt/cloudera/parcels/CDH-5.9.1-1.cdh5.9.1.p0.4/lib/hadoop-yarn/lib/ /opt/cloudera/parcels/CDH-5.9.1-1.cdh5.9.1.p0.4/lib/hadoop-mapreduce/ /opt/cloudera/parcels/CDH-5.9.1-1.cdh5.9.1.p0.4/lib/hadoop-mapreduce/lib/ Of those 199 jars there's at least 88 that will never be used (we don't use MRv1 at all). I *think* that this is coming from YARN's configuration; yarn.application.classpath and mapreduce.application.classpath would account for all the jars being uploaded. I've tried uploading these jar files to a custom place in HDFS and changing these settings, without much success (maybe because of the colon in the hdfs url?). Any help with this is appreciated.
... View more