Support Questions
Find answers, ask questions, and share your expertise

Pig script error when storing results into HIve Tables

Pig script error when storing results into HIve Tables

New Contributor

Hi guys.

I've been struggling with one issue for quite some time now. I have a pig script, it's quite long but not too elaborate, where I load data from 5 different external tables with the same structure, union the tables, do a bit of cleansing, make a group to eliminate duplicates and after USING org.apache.hive.hcatalog.pig.HCatStorer() I load the results into a HIve table... Everything works until the point that I had another table with the same structure as the other and the script don't run producing any error. If instead, I dump the output everything runs smoothly.

Error from log:

ERROR 2244: Job failed, hadoop does not return any error message

Can someone help?

Thank you.

3 REPLIES 3

Re: Pig script error when storing results into HIve Tables

@Gustavo Bras

Open a grunt shell like:

pig -useHCatalog

Paste the commands you wrote in the script.

Look at the error generated by the dump statement.

Post that error here

Thanks

Re: Pig script error when storing results into HIve Tables

New Contributor

Thank you @rbiswas for your reply.

When running the script on the grunt shell (pig -useHCatalog) it works without problems. We ran the script inserting line by line in the grunt shell and running "pig -useHCatalog hdfs:path/to/script" and both ways worked without problems.

When running it on an Oozie workflow we get the following error:

2017-07-24 13:07:14,194 [PigTezLauncher-0] ERROR org.apache.pig.backend.hadoop.executionengine.tez.TezJob  - Cannot submit DAG - Application id: application_1500373976542_0323
java.io.EOFException: End of File Exception between local host is: "FRCCH1BASAPY02.ALICO.CORP/10.64.34.22"; destination host is: "FRCCH1BASAPY02.ALICO.CORP":45770; : java.io.EOFException; For more details see:  http://wiki.apache.org/hadoop/EOFException
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:801)
at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:765)
at org.apache.hadoop.ipc.Client.call(Client.java:1430)
at org.apache.hadoop.ipc.Client.call(Client.java:1363)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
at com.sun.proxy.$Proxy38.submitDAG(Unknown Source)
at org.apache.tez.client.TezClient.submitDAGSession(TezClient.java:520)
at org.apache.tez.client.TezClient.submitDAG(TezClient.java:436)
at org.apache.pig.backend.hadoop.executionengine.tez.TezJob.run(TezJob.java:161)
at org.apache.pig.backend.hadoop.executionengine.tez.TezLauncher$1.run(TezLauncher.java:206)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.EOFException
at java.io.DataInputStream.readInt(DataInputStream.java:392)
at org.apache.hadoop.ipc.Client$Connection.receiveRpcResponse(Client.java:1087)
at org.apache.hadoop.ipc.Client$Connection.run(Client.java:982)
2017-07-24 13:07:14,234 [main] INFO  org.apache.pig.tools.pigstats.tez.TezPigScriptStats  - Script Statistics:
       HadoopVersion: 2.7.1.2.4.2.0-258                                                                                  
          PigVersion: 0.15.0.2.4.2.0-258                                                                                 
          TezVersion: 0.7.0.2.4.2.0-258                                                                                  
              UserId: yarn                                                                                               
            FileName: ITA_02_PRE_LEAD.pig                                                                                
           StartedAt: 2017-07-24 13:06:22                                                                               
          FinishedAt: 2017-07-24 13:07:14                                                                               
            Features: RANK,FILTER,UNION                                                                                 
Failed!
DAG PigLatin:ITA_02_PRE_LEAD.pig-0_scope-0:
       ApplicationId: null                                                                                                
  TotalLaunchedTasks: -1                                                                                                 
       FileBytesRead: -1                                                                                                  
    FileBytesWritten: -1                                                                                                 
       HdfsBytesRead: 0                                                                                                  
    HdfsBytesWritten: 0     

From the troubleshooting we did it seems that this error is when we use the UNION function on 6 different sources:

TA_EXIT_FILE = UNION
                  ITA_CFI_EXIT_FILE_RANKED_NOHEADER_WITH_SOURCE,
                  ITA_COLLIGO_EXIT_FILE_RANKED_NOHEADER_WITH_SOURCE,
                  ITA_CONTACTA_EXIT_FILE_RANKED_NOHEADER_WITH_SOURCE,
                  ITA_ECN_EXIT_FILE_RANKED_NOHEADER_WITH_SOURCE,
                  ITA_RBS_EXIT_FILE_RANKED_NOHEADER_WITH_SOURCE,
                  ITA_WAVE_EXIT_FILE_RANKED_NOHEADER_WITH_SOURCE
                  ;

If we remove any source from the UNION the scripts runs in the Oozie workflow.

Do you have any idea about what can be causing the issue?

Re: Pig script error when storing results into HIve Tables

New Contributor

@rbiswas

When running the script on the grunt shell (pig -useHCatalog) it works without problems. We ran the script inserting line by line in the grunt shell and running "pig -useHCatalog hdfs:path/to/script" and both ways worked without problems.

When running it on an Oozie workflow we get the following error:

2017-07-24 13:07:14,194 [PigTezLauncher-0] ERROR org.apache.pig.backend.hadoop.executionengine.tez.TezJob  - Cannot submit DAG - Application id: application_1500373976542_0323
java.io.EOFException: End of File Exception between local host is: "FRCCH1BASAPY02.ALICO.CORP/10.64.34.22"; destination host is: "FRCCH1BASAPY02.ALICO.CORP":45770; : java.io.EOFException; For more details see:  http://wiki.apache.org/hadoop/EOFException
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:801)
at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:765)
at org.apache.hadoop.ipc.Client.call(Client.java:1430)
at org.apache.hadoop.ipc.Client.call(Client.java:1363)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
at com.sun.proxy.$Proxy38.submitDAG(Unknown Source)
at org.apache.tez.client.TezClient.submitDAGSession(TezClient.java:520)
at org.apache.tez.client.TezClient.submitDAG(TezClient.java:436)
at org.apache.pig.backend.hadoop.executionengine.tez.TezJob.run(TezJob.java:161)
at org.apache.pig.backend.hadoop.executionengine.tez.TezLauncher$1.run(TezLauncher.java:206)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.EOFException
at java.io.DataInputStream.readInt(DataInputStream.java:392)
at org.apache.hadoop.ipc.Client$Connection.receiveRpcResponse(Client.java:1087)
at org.apache.hadoop.ipc.Client$Connection.run(Client.java:982)
2017-07-24 13:07:14,234 [main] INFO  org.apache.pig.tools.pigstats.tez.TezPigScriptStats  - Script Statistics:
       HadoopVersion: 2.7.1.2.4.2.0-258                                                                                  
          PigVersion: 0.15.0.2.4.2.0-258                                                                                  
          TezVersion: 0.7.0.2.4.2.0-258                                                                                  
              UserId: yarn                                                                                                
            FileName: ITA_02_PRE_LEAD.pig                                                                                
           StartedAt: 2017-07-24 13:06:22                                                                                
          FinishedAt: 2017-07-24 13:07:14                                                                                
            Features: RANK,FILTER,UNION                                                                                  
Failed!
DAG PigLatin:ITA_02_PRE_LEAD.pig-0_scope-0:
       ApplicationId: null                                                                                                
  TotalLaunchedTasks: -1                                                                                                  
       FileBytesRead: -1                                                                                                  
    FileBytesWritten: -1                                                                                                  
       HdfsBytesRead: 0                                                                                                  
    HdfsBytesWritten: 0      

From the troubleshooting we did it seems that this error is when we use the UNION function on 6 different sources:

ITA_EXIT_FILE = UNION 
                      ITA_CFI_EXIT_FILE_RANKED_NOHEADER_WITH_SOURCE,
                      ITA_COLLIGO_EXIT_FILE_RANKED_NOHEADER_WITH_SOURCE,
                      ITA_CONTACTA_EXIT_FILE_RANKED_NOHEADER_WITH_SOURCE,
                      ITA_ECN_EXIT_FILE_RANKED_NOHEADER_WITH_SOURCE,
                      ITA_RBS_EXIT_FILE_RANKED_NOHEADER_WITH_SOURCE,
                      ITA_WAVE_EXIT_FILE_RANKED_NOHEADER_WITH_SOURCE
                      ;

If we remove any source from the UNION the scripts runs in the Oozie workflow.

Do you have any idea about what can be causing the issue?