Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Pig on Spark (How to use spark as an execution engine for pig scripts)

Solved Go to solution
Highlighted

Pig on Spark (How to use spark as an execution engine for pig scripts)

Hello,

I would like to execute pig script using spark as execution engine. Is there any way to do so. Below is the Jira link for the same issue but couldn't make it happen. Any help regarding this would be appreciated.

https://issues.apache.org/jira/browse/PIG-4059

Regards,

Mitesh

1 ACCEPTED SOLUTION

Accepted Solutions
Highlighted

Re: Pig on Spark (How to use spark as an execution engine for pig scripts)

Cloudera Employee

As pig-on-spark is not currently supported (its not in immediate plans either) in HDP, you might want to raise this question in the apache pig mailing list itself, where some developers who are working on it might be able to respond.

As mentioned previously, pig-on-tez is lot more mature as it has been in production use for few years now.

View solution in original post

3 REPLIES 3
Highlighted

Re: Pig on Spark (How to use spark as an execution engine for pig scripts)

Cloudera Employee

Pig on spark is a very new feature, and it still not part of an official apache release. It is likely to be take more time before it would be widely used in production and reccomended for production use.

Pig on tez has been around for sometime and has been used in production many large installations. I would reccomend using that over pig on spark.

Highlighted

Re: Pig on Spark (How to use spark as an execution engine for pig scripts)

Hi Tejas,

Thanks for your response. Yes, agreed that it will be available when they will release pig 0.17 version, but as of now for development environment they have created github repo for pig 0.17 which we can use and implement spark as execution engine. I have already implemented spark as local mode but facing some issue with yarn-client mode.

Error for yarn-client mode:

sshuser@hn0-dfspar:~/pig/bin$ ./pig -x spark Using Spark Home: /usr/hdp/current/spark-client 17/06/02 05:41:54 INFO pig.ExecTypeProvider: Trying ExecType : LOCAL 17/06/02 05:41:54 INFO pig.ExecTypeProvider: Trying ExecType : MAPREDUCE 17/06/02 05:41:54 INFO pig.ExecTypeProvider: Trying ExecType : TEZ_LOCAL 17/06/02 05:41:54 INFO pig.ExecTypeProvider: Trying ExecType : TEZ 17/06/02 05:41:54 INFO pig.ExecTypeProvider: Trying ExecType : SPARK 17/06/02 05:41:54 INFO pig.ExecTypeProvider: Picked SPARK as the ExecType 17/06/02 05:41:54 ERROR pig.Main: ERROR 2998: Unhandled internal error. org/apache/spark/scheduler/SparkListener 17/06/02 05:41:54 WARN pig.Main: There is no log file to write to. 17/06/02 05:41:54 ERROR pig.Main: java.lang.NoClassDefFoundError: org/apache/spark/scheduler/SparkListener at org.apache.pig.backend.hadoop.executionengine.spark.SparkExecutionEngine.<init>(SparkExecutionEngine.java:35) at org.apache.pig.backend.hadoop.executionengine.spark.SparkExecType.getExecutionEngine(SparkExecType.java:42) at org.apache.pig.impl.PigContext.<init>(PigContext.java:269) at org.apache.pig.impl.PigContext.<init>(PigContext.java:256) at org.apache.pig.Main.run(Main.java:389) at org.apache.pig.Main.main(Main.java:175) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.util.RunJar.run(RunJar.java:233) at org.apache.hadoop.util.RunJar.main(RunJar.java:148) Caused by: java.lang.ClassNotFoundException: org.apache.spark.scheduler.SparkListener at java.net.URLClassLoader.findClass(URLClassLoader.java:381) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) ... 12 more

Regards,

Mitesh

Highlighted

Re: Pig on Spark (How to use spark as an execution engine for pig scripts)

Cloudera Employee

As pig-on-spark is not currently supported (its not in immediate plans either) in HDP, you might want to raise this question in the apache pig mailing list itself, where some developers who are working on it might be able to respond.

As mentioned previously, pig-on-tez is lot more mature as it has been in production use for few years now.

View solution in original post

Don't have an account?
Coming from Hortonworks? Activate your account here