question Re: Pig on Spark (How to use spark as an execution engine for pig scripts) in Support Questions

Pig on Spark (How to use spark as an execution engine for pig scripts)

galamitesh1005 — Thu, 01 Jun 2017 13:49:27 GMT

Hello,

I would like to execute pig script using spark as execution engine. Is there any way to do so. Below is the Jira link for the same issue but couldn't make it happen. Any help regarding this would be appreciated.

https://issues.apache.org/jira/browse/PIG-4059

Regards,

Mitesh

Re: Pig on Spark (How to use spark as an execution engine for pig scripts)

thejas — Fri, 02 Jun 2017 01:37:48 GMT

Pig on spark is a very new feature, and it still not part of an official apache release. It is likely to be take more time before it would be widely used in production and reccomended for production use.

Pig on tez has been around for sometime and has been used in production many large installations. I would reccomend using that over pig on spark.

Re: Pig on Spark (How to use spark as an execution engine for pig scripts)

galamitesh1005 — Sat, 03 Jun 2017 03:17:06 GMT

Hi Tejas,

Thanks for your response. Yes, agreed that it will be available when they will release pig 0.17 version, but as of now for development environment they have created github repo for pig 0.17 which we can use and implement spark as execution engine. I have already implemented spark as local mode but facing some issue with yarn-client mode.

Error for yarn-client mode:

sshuser@hn0-dfspar:~/pig/bin$ ./pig -x spark Using Spark Home: /usr/hdp/current/spark-client 17/06/02 05:41:54 INFO pig.ExecTypeProvider: Trying ExecType : LOCAL 17/06/02 05:41:54 INFO pig.ExecTypeProvider: Trying ExecType : MAPREDUCE 17/06/02 05:41:54 INFO pig.ExecTypeProvider: Trying ExecType : TEZ_LOCAL 17/06/02 05:41:54 INFO pig.ExecTypeProvider: Trying ExecType : TEZ 17/06/02 05:41:54 INFO pig.ExecTypeProvider: Trying ExecType : SPARK 17/06/02 05:41:54 INFO pig.ExecTypeProvider: Picked SPARK as the ExecType 17/06/02 05:41:54 ERROR pig.Main: ERROR 2998: Unhandled internal error. org/apache/spark/scheduler/SparkListener 17/06/02 05:41:54 WARN pig.Main: There is no log file to write to. 17/06/02 05:41:54 ERROR pig.Main: java.lang.NoClassDefFoundError: org/apache/spark/scheduler/SparkListener at org.apache.pig.backend.hadoop.executionengine.spark.SparkExecutionEngine.<init>(SparkExecutionEngine.java:35) at org.apache.pig.backend.hadoop.executionengine.spark.SparkExecType.getExecutionEngine(SparkExecType.java:42) at org.apache.pig.impl.PigContext.<init>(PigContext.java:269) at org.apache.pig.impl.PigContext.<init>(PigContext.java:256) at org.apache.pig.Main.run(Main.java:389) at org.apache.pig.Main.main(Main.java:175) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.util.RunJar.run(RunJar.java:233) at org.apache.hadoop.util.RunJar.main(RunJar.java:148) Caused by: java.lang.ClassNotFoundException: org.apache.spark.scheduler.SparkListener at java.net.URLClassLoader.findClass(URLClassLoader.java:381) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) ... 12 more

Regards,

Mitesh

Re: Pig on Spark (How to use spark as an execution engine for pig scripts)

thejas — Sat, 03 Jun 2017 03:37:33 GMT

As pig-on-spark is not currently supported (its not in immediate plans either) in HDP, you might want to raise this question in the apache pig mailing list itself, where some developers who are working on it might be able to respond.

As mentioned previously, pig-on-tez is lot more mature as it has been in production use for few years now.