Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Pig on Spark (How to use spark as an execution engine for pig scripts)

avatar

Hello,

I would like to execute pig script using spark as execution engine. Is there any way to do so. Below is the Jira link for the same issue but couldn't make it happen. Any help regarding this would be appreciated.

https://issues.apache.org/jira/browse/PIG-4059

Regards,

Mitesh

1 ACCEPTED SOLUTION

avatar
Contributor

As pig-on-spark is not currently supported (its not in immediate plans either) in HDP, you might want to raise this question in the apache pig mailing list itself, where some developers who are working on it might be able to respond.

As mentioned previously, pig-on-tez is lot more mature as it has been in production use for few years now.

View solution in original post

3 REPLIES 3

avatar
Contributor

Pig on spark is a very new feature, and it still not part of an official apache release. It is likely to be take more time before it would be widely used in production and reccomended for production use.

Pig on tez has been around for sometime and has been used in production many large installations. I would reccomend using that over pig on spark.

avatar

Hi Tejas,

Thanks for your response. Yes, agreed that it will be available when they will release pig 0.17 version, but as of now for development environment they have created github repo for pig 0.17 which we can use and implement spark as execution engine. I have already implemented spark as local mode but facing some issue with yarn-client mode.

Error for yarn-client mode:

sshuser@hn0-dfspar:~/pig/bin$ ./pig -x spark Using Spark Home: /usr/hdp/current/spark-client 17/06/02 05:41:54 INFO pig.ExecTypeProvider: Trying ExecType : LOCAL 17/06/02 05:41:54 INFO pig.ExecTypeProvider: Trying ExecType : MAPREDUCE 17/06/02 05:41:54 INFO pig.ExecTypeProvider: Trying ExecType : TEZ_LOCAL 17/06/02 05:41:54 INFO pig.ExecTypeProvider: Trying ExecType : TEZ 17/06/02 05:41:54 INFO pig.ExecTypeProvider: Trying ExecType : SPARK 17/06/02 05:41:54 INFO pig.ExecTypeProvider: Picked SPARK as the ExecType 17/06/02 05:41:54 ERROR pig.Main: ERROR 2998: Unhandled internal error. org/apache/spark/scheduler/SparkListener 17/06/02 05:41:54 WARN pig.Main: There is no log file to write to. 17/06/02 05:41:54 ERROR pig.Main: java.lang.NoClassDefFoundError: org/apache/spark/scheduler/SparkListener at org.apache.pig.backend.hadoop.executionengine.spark.SparkExecutionEngine.<init>(SparkExecutionEngine.java:35) at org.apache.pig.backend.hadoop.executionengine.spark.SparkExecType.getExecutionEngine(SparkExecType.java:42) at org.apache.pig.impl.PigContext.<init>(PigContext.java:269) at org.apache.pig.impl.PigContext.<init>(PigContext.java:256) at org.apache.pig.Main.run(Main.java:389) at org.apache.pig.Main.main(Main.java:175) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.util.RunJar.run(RunJar.java:233) at org.apache.hadoop.util.RunJar.main(RunJar.java:148) Caused by: java.lang.ClassNotFoundException: org.apache.spark.scheduler.SparkListener at java.net.URLClassLoader.findClass(URLClassLoader.java:381) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) ... 12 more

Regards,

Mitesh

avatar
Contributor

As pig-on-spark is not currently supported (its not in immediate plans either) in HDP, you might want to raise this question in the apache pig mailing list itself, where some developers who are working on it might be able to respond.

As mentioned previously, pig-on-tez is lot more mature as it has been in production use for few years now.