Member since
10-28-2016
8
Posts
2
Kudos Received
1
Solution
My Accepted Solutions
Title | Views | Posted |
---|---|---|
2605 | 06-16-2017 04:38 PM |
06-16-2017
04:38 PM
1 Kudo
Ok, just to update, I followed the directions explicitly in the link provided by dsun (here). Using HDP 2.6 and Oozie 4.2 this is failing due to a known bug (jira). Basically what works with Spark 1.6 will not work with Spark 2.1 (via oozie anyway) due to a change in how Spark handles multiple files found in distributed cache (see here). java.lang.IllegalArgumentException: Attempt to add (hdfs://hdpcluster/user/oozie/share/lib/lib_20170411215324/oozie/aws-java-sdk-core-1.10.6.jar) multiple times to the distributed cache. I've tried removing multiple files, but there are so many (and even some duplicated in oozie sharelib and spark2 sharelib) that I'm afraid of removing them all and breaking 1.6 (thus removing ability to run any existing jobs under 1.6). Looks like it may be fixed in Oozie 4.3, but not sure how to update just Oozie service using Ambari (maybe I'll post another question for this). EDIT: After removing all duplicate files found between the sharelib for oozie and spark2, I still could not run a Spark2 job from Oozie 4.2. Was getting ImportError for a custom python file I was trying to import from the main application py file. Seems that Oozie wasn't setting --py-files correctly (again, worked fine with Spark 1.6). In conclusion, this is only experimental at best. Hopefully the next version of HDP will use the latest Oozie 4.3.
... View more
06-16-2017
01:59 PM
1 Kudo
Hi Artem, do you have a Hortonworks link stating that Spark2 is not officially supported in HDP via Oozie? I want to implement Spark2 via Oozie, and using HDP 2.6, and it seems from this doc that Spark2 via Oozie (oozie 4.2 in hdp2.6) IS possible. Perhaps the poster didn't copy some libraries or jars to the spark2 sharelib? (again, see link).
... View more
06-16-2017
01:40 PM
Thank you dsun! I'm working on these steps today. It seems from the instructions that once the sharelib for spark2 is setup, I can switch a given workflow to point to spark2 by specifying in job.properties: oozie.action.sharelib.for.spark=spark2 This would imply (I assume) that I can easily point back to using spark 1.6.3 by specifying: oozie.action.sharelib.for.spark=spark Is my assumption correct?
... View more
06-15-2017
08:15 PM
I have both Spark 1.6 and 2.0 installed on my cluster. I see in the docs how to manually run a spark-submit job and choose 2.0 here. However, I launch my jobs using Oozie. Is there a way to specify for a given Oozie workflow spark action that I want to use the 2.0 engine vs 1.6?
... View more
Labels:
- Labels:
-
Apache Oozie
-
Apache Spark
04-25-2017
04:24 PM
Not sure if this is the problem, but how many executors are working on the insert (when viewing your job via the Spark UI)? Are you setting executor-cores?
... View more
04-17-2017
07:00 PM
I faced this issue recently. Turns out that one of my datanodes was decomissioned (due to earlier maintenance). You might try checking the list of datanodes from the dfshealth page. Using default ports, for me it was: <<mynamenode>>:50070/dfshealth.html#tab-datanode That would list the datanodes and their status (active, decomissioned, space, etc...) Also note that recent versions of Ambari will give that link under HDFS Summary "Quick Links" dropdown (its called "Namenode UI")
... View more