Support Questions

Find answers, ask questions, and share your expertise

Hive query stops with Error "Execution Error, return code 2 from org.apache.hadoop.hive.ql.exe"

avatar
Explorer

I'm using Hive(with Yarn) that is installed by CDH-5.14.2-1, and made a database which keeps purchase history. One table which has purchase history has 1,000,000,000 tuples.

I tried the following query to measure Hive's performance.

 

SELECT c.gender, 
       g.NAME, 
       i.NAME, 
       Sum(b.num) 
FROM   customers c 
       JOIN boughts_bil b 
         ON ( c.id = b.cus_id 
              AND b.id < $var ) 
       JOIN items i 
         ON ( i.id = b.item_id ) 
       JOIN genres g 
         ON ( g.id = i.gen_id ) 
GROUP  BY c.gender, 
          g.NAME, 
          i.NAME; 

Incidentally, since I want to try with no optimization, I made no partitions.

 

When I set "$var=30,000,000", the error "Execution Error, return code 2 from org.apache.hadoop.hive.ql.exe" has occurred. In reality, I use the same query and that time it worked fine.

 

Cloudera's plan was Express when it was going well, but now the plan became Enterprise-only. Is it cause?

Or are there different reasons for example out of memory error.

 

Please give your wisdom.

 

Thanks.

 

addition

I checked HistoryServer and write like below

 

Diagnostics: 
Application failed due to failed ApplicationMaster.
Only partial information is available; some values may be inaccurate.

I'll check the table value.

4 REPLIES 4

avatar
Super Guru
Is it failed at MR side? We need to collect the YARN application logs and find out the exact message. Have you tried to run:

yarn logs -applicationId {application_id} -appOwner {username}

to collect the log and examine the output?

avatar
Explorer

Thanks for replying. And sorry that I miss clicked "accept as solution" .

 

I show a result of a run and a yarn log.

The run result is below

Query ID = ..._20180813111111_92d8a1f2-4614-49c6-8833-d7b2e709c79c
Total jobs = 2
Stage-1 is selected by condition resolver.
Launching Job 1 out of 2
Starting Job = job_1534123434864_0480, Tracking URL = http://...:8088/proxy/application_1534123434864_0480/
Kill Command = /.../hadoop job  -kill job_1534123434864_0480
Hadoop job information for Stage-1: number of mappers: 140; number of reducers: 557
2018-08-13 11:11:49,795 Stage-1 map = 0%,  reduce = 0%
... 2018-08-13 11:15:45,128 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 3475.74 sec MapReduce Total cumulative CPU time: 57 minutes 55 seconds 740 msec Ended Job = job_1534123434864_0480 Execution log at: /.../..._20180813111111_92d8a1f2-4614-49c6-8833-d7b2e709c79c.log 2018-08-13 11:15:51 Starting to launch local task to process map join; maximum memory = 1908932608 2018-08-13 11:15:52 Dump the side-table for tag: 1 with group count: 24 into file: file:/.../c33533aa-7637-4034-a3d1-2e8b857c2820/hive_2018-08-13_11-11-38_070_2752807246292956243-1/-local-10006/HashTable-Stage-4/MapJoin-mapfile01--.hashtable 2018-08-13 11:15:52 Uploaded 1 File to: file:/.../c33533aa-7637-4034-a3d1-2e8b857c2820/hive_2018-08-13_11-11-38_070_2752807246292956243-1/-local-10006/HashTable-Stage-4/MapJoin-mapfile01--.hashtable (902 bytes) 2018-08-13 11:15:52 Dump the side-table for tag: 1 with group count: 3500 into file: file:/.../c33533aa-7637-4034-a3d1-2e8b857c2820/hive_2018-08-13_11-11-38_070_2752807246292956243-1/-local-10006/HashTable-Stage-4/MapJoin-mapfile11--.hashtable 2018-08-13 11:15:52 Uploaded 1 File to: file:/.../c33533aa-7637-4034-a3d1-2e8b857c2820/hive_2018-08-13_11-11-38_070_2752807246292956243-1/-local-10006/HashTable-Stage-4/MapJoin-mapfile11--.hashtable (107794 bytes) 2018-08-13 11:15:52 End of local task; Time Taken: 1.54 sec. Execution completed successfully MapredLocal task succeeded Launching Job 2 out of 2 Number of reduce tasks not specified. Estimated from input data size: 1 Starting Job = job_1534123434864_0536, Tracking URL = http://...:8088/proxy/application_1534123434864_0536/ Kill Command = /.../hadoop job -kill job_1534123434864_0536 Hadoop job information for Stage-4: number of mappers: 4; number of reducers: 1 2018-08-13 11:16:23,048 Stage-4 map = 0%, reduce = 0% 2018-08-13 11:16:44,240 Stage-4 map = 25%, reduce = 0%, Cumulative CPU 2.28 sec 2018-08-13 11:16:46,330 Stage-4 map = 50%, reduce = 0%, Cumulative CPU 5.06 sec 2018-08-13 11:16:49,473 Stage-4 map = 75%, reduce = 0%, Cumulative CPU 9.58 sec 2018-08-13 11:16:50,520 Stage-4 map = 100%, reduce = 0%, Cumulative CPU 15.14 sec 2018-08-13 11:17:12,471 Stage-4 map = 0%, reduce = 0% 2018-08-13 11:17:42,680 Stage-4 map = 25%, reduce = 0%, Cumulative CPU 2.2 sec 2018-08-13 11:17:44,779 Stage-4 map = 50%, reduce = 0%, Cumulative CPU 5.25 sec 2018-08-13 11:17:46,873 Stage-4 map = 100%, reduce = 0%, Cumulative CPU 15.0 sec 2018-08-13 11:18:12,006 Stage-4 map = 0%, reduce = 0% MapReduce Total cumulative CPU time: 15 seconds 0 msec Ended Job = job_1534123434864_0536 with errors Error during job, obtaining debugging information... FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask MapReduce Jobs Launched: Stage-Stage-1: Map: 140 Reduce: 557 Cumulative CPU: 3475.74 sec HDFS Read: 37355213704 HDFS Write: 56143 SUCCESS Stage-Stage-4: Map: 4 Reduce: 1 Cumulative CPU: 15.0 sec HDFS Read: 0 HDFS Write: 0 FAIL Total MapReduce CPU Time Spent: 58 minutes 10 seconds 740 msec WARN: The method class org.apache.commons.logging.impl.SLF4JLogFactory#release() was invoked. WARN: Please see http://www.slf4j.org/codes.html#release for an explanation.

I checked 

yarn logs -applicationId application_1534123434864_0480

And there are some kinds of Errors in container_1534123434864_0480_02_000001

(1)ERROR [RMCommunicator Allocator] 
org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: 
Container complete event for unknown container container_1534123434864_0480_02_000143


(2)INFO [AsyncDispatcher event handler]
org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Diagnostics report from attempt_1534123434864_0480_r_000014_1000: 
Container killed on request. Exit code is 137
Container exited with a non-zero exit code 137
Killed by external signal

(3)INFO [AsyncDispatcher event handler]
org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl:
Diagnostics report from attempt_1534123434864_0480_r_000041_1000:
Container exited with a non-zero exit code 154

(4)ERROR [ContainerLauncher #1] 
org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: 
Container launch failed for container_1534123434864_0480_02_000241 : 
java.io.IOException: Failed on local exception: java.io.IOException: java.io.IOException: 
Connection reset from partner; Host Details : local host is: "node3"; destination host is: "node2":8041; 
    at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:772)
    at org.apache.hadoop.ipc.Client.call(Client.java:1508)
    at org.apache.hadoop.ipc.Client.call(Client.java:1441)
    at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:230)
    at com.sun.proxy.$Proxy40.startContainers(Unknown Source)
    at org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagementProtocolPBClientImpl.startContainers(ContainerManagementProtocolPBClientImpl.java:96)
    at sun.reflect.GeneratedMethodAccessor5.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:258)
    at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:104)
    at com.sun.proxy.$Proxy41.startContainers(Unknown Source)
    at org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$Container.launch(ContainerLauncherImpl.java:155)
    at org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:379)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.IOException: java.io.IOException: Connection reset from partner
    at org.apache.hadoop.ipc.Client$Connection$1.run(Client.java:718)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:415)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1920)
    at org.apache.hadoop.ipc.Client$Connection.handleSaslConnectionFailure(Client.java:681)
    at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:769)
    at org.apache.hadoop.ipc.Client$Connection.access$3000(Client.java:396)
    at org.apache.hadoop.ipc.Client.getConnection(Client.java:1557)
    at org.apache.hadoop.ipc.Client.call(Client.java:1480)
    ... 15 more
Caused by: java.io.IOException: Connection reset from partner
    at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
    at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
    at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
    at sun.nio.ch.IOUtil.read(IOUtil.java:197)
    at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379)
    at org.apache.hadoop.net.SocketInputStream$Reader.performIO(SocketInputStream.java:57)
    at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142)
    at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161)
    at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:131)
    at java.io.FilterInputStream.read(FilterInputStream.java:133)
    at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
    at java.io.BufferedInputStream.read(BufferedInputStream.java:254)
    at java.io.DataInputStream.readInt(DataInputStream.java:387)
    at org.apache.hadoop.security.SaslRpcClient.saslConnect(SaslRpcClient.java:370)
    at org.apache.hadoop.ipc.Client$Connection.setupSaslConnection(Client.java:594)
    at org.apache.hadoop.ipc.Client$Connection.access$2000(Client.java:396)
    at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:761)
    at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:757)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:415)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1920)
    at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:756)
    ... 18 more

 

avatar
Super Guru
hmm, job job_1534123434864_0480 finished successfully, I think you should check the log for job job_1534123434864_0536 instead.

avatar
Explorer

I couldn't find any error in job job_1534123434864_0536. So I uninstall cloudera manager and reinstall, then it works well.

Thanks for helping.