Created on 03-23-2017 02:20 AM - edited 09-16-2022 04:19 AM
Hi Cloudera,
I have integrated Zeppelin with CDH 5.8.3 and tried running few pig examples as in Zeppelin tutorial and ran in to few issues.
PIG Code used:
%pig
bankText = load 'hdfs://nameservice1/user/rbodolla/emp.txt' using PigStorage(';');
bank = foreach bankText generate $0 as age, $1 as job, $2 as marital, $3 as education, $5 as balance;
bank = filter bank by age != '"age"';
bank = foreach bank generate (int)age, job, marital, (int) balance;
%pig.query
bank_data = filter bank by age < ${maxAge=40};
b = group bank_data by age;
foreach b generate group, COUNT($1);
Issue 1: Zeppelin does not support KMS HA. After removing KMS HA the issue has been resolved.
Issue 2:
INFO [2017-03-23 07:21:35,013] ({pool-2-thread-5} JobControlCompiler.java[getJob]:579) - This job cannot be converted run in-process
INFO [2017-03-23 07:21:35,019] ({pool-2-thread-5} Configuration.java[warnOnceIfDeprecated]:1049) - mapred.submit.replication is deprecated. Instead, use mapreduce.client.submit.file.replication
ERROR [2017-03-23 07:21:35,071] ({pool-2-thread-5} Job.java[run]:188) - Job failed
java.lang.NoSuchMethodError: org.codehaus.jackson.map.ObjectMapper.writerWithDefaultPrettyPrinter()Lorg/codehaus/jackson/map/ObjectWriter;
at org.apache.hadoop.crypto.key.kms.KMSClientProvider.writeJson(KMSClientProvider.java:211)
at org.apache.hadoop.crypto.key.kms.KMSClientProvider.call(KMSClientProvider.java:448)
at org.apache.hadoop.crypto.key.kms.KMSClientProvider.call(KMSClientProvider.java:439)
at org.apache.hadoop.crypto.key.kms.KMSClientProvider.decryptEncryptedKey(KMSClientProvider.java:712)
at org.apache.hadoop.crypto.key.KeyProviderCryptoExtension.decryptEncryptedKey(KeyProviderCryptoExtension.java:388)
at org.apache.hadoop.hdfs.DFSClient.decryptEncryptedDataEncryptionKey(DFSClient.java:1358)
at org.apache.hadoop.hdfs.DFSClient.createWrappedOutputStream(DFSClient.java:1457)
at org.apache.hadoop.hdfs.DFSClient.createWrappedOutputStream(DFSClient.java:1442)
at org.apache.hadoop.hdfs.DistributedFileSystem$6.doCall(DistributedFileSystem.java:400)
at org.apache.hadoop.hdfs.DistributedFileSystem$6.doCall(DistributedFileSystem.java:393)
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:393)
at org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:337)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:908)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:889)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:786)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:775)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.shipToHDFS(JobControlCompiler.java:1805)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.putJarOnClassPathThroughDistributedCache(JobControlCompiler.java:1686)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.getJob(JobControlCompiler.java:648)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.compile(JobControlCompiler.java:323)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:196)
at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.launchPig(HExecutionEngine.java:308)
at org.apache.pig.PigServer.launchPlan(PigServer.java:1474)
at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1459)
at org.apache.pig.PigServer.storeEx(PigServer.java:1118)
at org.apache.pig.PigServer.store(PigServer.java:1081)
at org.apache.pig.PigServer.openIterator(PigServer.java:994)
at org.apache.zeppelin.pig.PigQueryInterpreter.interpret(PigQueryInterpreter.java:104)
at org.apache.zeppelin.interpreter.LazyOpenInterpreter.interpret(LazyOpenInterpreter.java:94)
at org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:489)
at org.apache.zeppelin.scheduler.Job.run(Job.java:175)
at org.apache.zeppelin.scheduler.FIFOScheduler$1.run(FIFOScheduler.java:139)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
INFO [2017-03-23 07:21:35,079] ({pool-2-thread-5} SchedulerFactory.java[jobFinished]:137) - Job remoteInterpretJob_1490253694068 finished by scheduler org.apache.zeppelin.pig.PigInterpreter363704768
Created 03-23-2017 04:28 AM
This has been resolved after adding latest jackson-core-2.5.3.jar,jackson-core-asl-1.9.13.jar,jackson-mapper-asl-1.9.13.jar to pig interpreter path. By default, Zeppelin doesn't copy these latest files.
Created 03-23-2017 04:28 AM
This has been resolved after adding latest jackson-core-2.5.3.jar,jackson-core-asl-1.9.13.jar,jackson-mapper-asl-1.9.13.jar to pig interpreter path. By default, Zeppelin doesn't copy these latest files.