Sqoop 1 has a nice option called --skip-dist-cache that prevents Sqoop from copying its distributed cache every time for its MapReduce2 job to execute. The application jar is always copied and the "libjars" sub-directory is created for the mentioned dependencies (100s of MB of files, slowing down the execution every time).
As far as I can tell the jar dependencies are linked here:
All nodes already have this path thanks to CM, but it would be a small effort to push the jars once every upgrade to a set HDFS folder for use with the distributed cache. Oozie somehow does this very thing but it's not necessarily documented for others to do.
Does anyone know how to set a local or HDFS path for Sqoop's MR distributed cache? Maybe it is an unexposed setting in the MR2/Yarn job that sqoop creates on the destination side.