02-23-2016 03:32 PM
Dear Cloudera support,
We have a our spark job need to be run as a workflow in oozie in CDH 5.5.0.
1.Now the spark job can be run successfully in submmit command line as below:
spark-submit --master spark://ip-10-0-4-248.us-west-1.compute.internal:7077 --class com.gridx.spark.MeterReadingLoader --name 'smud_test1' --driver-class-path /opt/cloudera/parcels/CDH/jars/guava-16.0.1.jar:/opt/cloudera/parcels/CDH-5.5.0-1.cdh5.5.0.p0.8/jars/jets3t-0.9.0.jar --conf spark.executor.extraClassPath=/opt/cloudera/parcels/CDH-5.5.0-1.cdh5.5.0.p0.8/jars/jets3t-0.9.0.jar ~/spark-all.jar -i s3n://meter-data/batch_data_phase1/smud_phase1_10.csv -k smud_stage -h 10.0.4.243 -t 60 -z America/Los_Angeles -l smud_test1 -g SMUD
2.However, when we use CDH oozie REST API or CDH Hue-OOzie in CDH to submit the same spark job with following REST API, it launched an oozie launcher job" oozie:launcher:T=spark:W=meter_reading_loader:A=spark-17c0:ID=0000027-160202081901924-oozie-oozi-W", and be failed with OOM and PermGen exception.
BTW, Our jar "spark-all.jar" has 88M size.
I also tried to enlarge the MaxPermGen and memory, and also increazed memory for oozie launcher itself in workflow.xml as following, but still got no luck. I'm not sure whether I increase memory in the right way. The workflow xml for oozie is as following. Please correct me if I'm wrong.
We used following Oozie REST API to call the workflow:
curl -X POST -H "Content-Type: application/xml" -d @config.xml http://localhost:11000/oozie/v2/jobs?action=start
Can you help out? Thanks very much!
02-28-2016 10:19 PM