Member since
07-21-2017
2
Posts
0
Kudos Received
0
Solutions
07-27-2017
09:28 PM
Thank you @rbiswas for your reply. When running the script on the grunt shell (pig -useHCatalog) it works without problems. We ran the script inserting line by line in the grunt shell and running "pig -useHCatalog hdfs:path/to/script" and both ways worked without problems. When running it on an Oozie workflow we get the following error: 2017-07-24 13:07:14,194 [PigTezLauncher-0] ERROR org.apache.pig.backend.hadoop.executionengine.tez.TezJob - Cannot submit DAG - Application id: application_1500373976542_0323
java.io.EOFException: End of File Exception between local host is: "FRCCH1BASAPY02.ALICO.CORP/10.64.34.22"; destination host is: "FRCCH1BASAPY02.ALICO.CORP":45770; : java.io.EOFException; For more details see: http://wiki.apache.org/hadoop/EOFException
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:801)
at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:765)
at org.apache.hadoop.ipc.Client.call(Client.java:1430)
at org.apache.hadoop.ipc.Client.call(Client.java:1363)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
at com.sun.proxy.$Proxy38.submitDAG(Unknown Source)
at org.apache.tez.client.TezClient.submitDAGSession(TezClient.java:520)
at org.apache.tez.client.TezClient.submitDAG(TezClient.java:436)
at org.apache.pig.backend.hadoop.executionengine.tez.TezJob.run(TezJob.java:161)
at org.apache.pig.backend.hadoop.executionengine.tez.TezLauncher$1.run(TezLauncher.java:206)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.EOFException
at java.io.DataInputStream.readInt(DataInputStream.java:392)
at org.apache.hadoop.ipc.Client$Connection.receiveRpcResponse(Client.java:1087)
at org.apache.hadoop.ipc.Client$Connection.run(Client.java:982)
2017-07-24 13:07:14,234 [main] INFO org.apache.pig.tools.pigstats.tez.TezPigScriptStats - Script Statistics:
HadoopVersion: 2.7.1.2.4.2.0-258
PigVersion: 0.15.0.2.4.2.0-258
TezVersion: 0.7.0.2.4.2.0-258
UserId: yarn
FileName: ITA_02_PRE_LEAD.pig
StartedAt: 2017-07-24 13:06:22
FinishedAt: 2017-07-24 13:07:14
Features: RANK,FILTER,UNION
Failed!
DAG PigLatin:ITA_02_PRE_LEAD.pig-0_scope-0:
ApplicationId: null
TotalLaunchedTasks: -1
FileBytesRead: -1
FileBytesWritten: -1
HdfsBytesRead: 0
HdfsBytesWritten: 0
From the troubleshooting we did it seems that this error is when we use the UNION function on 6 different sources: TA_EXIT_FILE = UNION
ITA_CFI_EXIT_FILE_RANKED_NOHEADER_WITH_SOURCE,
ITA_COLLIGO_EXIT_FILE_RANKED_NOHEADER_WITH_SOURCE,
ITA_CONTACTA_EXIT_FILE_RANKED_NOHEADER_WITH_SOURCE,
ITA_ECN_EXIT_FILE_RANKED_NOHEADER_WITH_SOURCE,
ITA_RBS_EXIT_FILE_RANKED_NOHEADER_WITH_SOURCE,
ITA_WAVE_EXIT_FILE_RANKED_NOHEADER_WITH_SOURCE
;
If we remove any source from the UNION the scripts runs in the Oozie workflow. Do you have any idea about what can be causing the issue?
... View more
07-21-2017
03:21 PM
Hi guys. I've been struggling with one issue for quite some time now. I have a pig script, it's quite long but not too elaborate, where I load data from 5 different external tables with the same structure, union the tables, do a bit of cleansing, make a group to eliminate duplicates and after USING org.apache.hive.hcatalog.pig.HCatStorer() I load the results into a HIve table... Everything works until the point that I had another table with the same structure as the other and the script don't run producing any error. If instead, I dump the output everything runs smoothly. Error from log: ERROR 2244: Job failed, hadoop does not return any error message
Can someone help? Thank you.
... View more
Labels: