Support Questions

tbt0127 · ‎08-08-2018

I'm using Hive(with Yarn) that is installed by CDH-5.14.2-1, and made a database which keeps purchase history. One table which has purchase history has 1,000,000,000 tuples.

I tried the following query to measure Hive's performance.

SELECT c.gender, 
       g.NAME, 
       i.NAME, 
       Sum(b.num) 
FROM   customers c 
       JOIN boughts_bil b 
         ON ( c.id = b.cus_id 
              AND b.id < $var ) 
       JOIN items i 
         ON ( i.id = b.item_id ) 
       JOIN genres g 
         ON ( g.id = i.gen_id ) 
GROUP  BY c.gender, 
          g.NAME, 
          i.NAME;

Incidentally, since I want to try with no optimization, I made no partitions.

When I set "$var=30,000,000", the error "Execution Error, return code 2 from org.apache.hadoop.hive.ql.exe" has occurred. In reality, I use the same query and that time it worked fine.

Cloudera's plan was Express when it was going well, but now the plan became Enterprise-only. Is it cause?

Or are there different reasons for example out of memory error.

Please give your wisdom.

Thanks.

addition

I checked HistoryServer and write like below

Diagnostics: 
Application failed due to failed ApplicationMaster. 
Only partial information is available; some values may be inaccurate.

I'll check the table value.

EricL · ‎08-20-2018

Is it failed at MR side? We need to collect the YARN application logs and find out the exact message. Have you tried to run:

yarn logs -applicationId {application_id} -appOwner {username}

to collect the log and examine the output?

tbt0127 · ‎08-21-2018

Thanks for replying. And sorry that I miss clicked "accept as solution" .

I show a result of a run and a yarn log.

The run result is below

Query ID = ..._20180813111111_92d8a1f2-4614-49c6-8833-d7b2e709c79c
Total jobs = 2
Stage-1 is selected by condition resolver.
Launching Job 1 out of 2
Starting Job = job_1534123434864_0480, Tracking URL = http://...:8088/proxy/application_1534123434864_0480/
Kill Command = /.../hadoop job  -kill job_1534123434864_0480
Hadoop job information for Stage-1: number of mappers: 140; number of reducers: 557
2018-08-13 11:11:49,795 Stage-1 map = 0%,  reduce = 0%
...
2018-08-13 11:15:45,128 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 3475.74 sec
MapReduce Total cumulative CPU time: 57 minutes 55 seconds 740 msec
Ended Job = job_1534123434864_0480
Execution log at: /.../..._20180813111111_92d8a1f2-4614-49c6-8833-d7b2e709c79c.log
2018-08-13 11:15:51 Starting to launch local task to process map join;  maximum memory = 1908932608
2018-08-13 11:15:52 Dump the side-table for tag: 1 with group count: 24 into file: file:/.../c33533aa-7637-4034-a3d1-2e8b857c2820/hive_2018-08-13_11-11-38_070_2752807246292956243-1/-local-10006/HashTable-Stage-4/MapJoin-mapfile01--.hashtable
2018-08-13 11:15:52 Uploaded 1 File to: file:/.../c33533aa-7637-4034-a3d1-2e8b857c2820/hive_2018-08-13_11-11-38_070_2752807246292956243-1/-local-10006/HashTable-Stage-4/MapJoin-mapfile01--.hashtable (902 bytes)
2018-08-13 11:15:52 Dump the side-table for tag: 1 with group count: 3500 into file: file:/.../c33533aa-7637-4034-a3d1-2e8b857c2820/hive_2018-08-13_11-11-38_070_2752807246292956243-1/-local-10006/HashTable-Stage-4/MapJoin-mapfile11--.hashtable
2018-08-13 11:15:52 Uploaded 1 File to: file:/.../c33533aa-7637-4034-a3d1-2e8b857c2820/hive_2018-08-13_11-11-38_070_2752807246292956243-1/-local-10006/HashTable-Stage-4/MapJoin-mapfile11--.hashtable (107794 bytes)
2018-08-13 11:15:52 End of local task; Time Taken: 1.54 sec.
Execution completed successfully
MapredLocal task succeeded
Launching Job 2 out of 2
Number of reduce tasks not specified. Estimated from input data size: 1
Starting Job = job_1534123434864_0536, Tracking URL = http://...:8088/proxy/application_1534123434864_0536/
Kill Command = /.../hadoop job  -kill job_1534123434864_0536
Hadoop job information for Stage-4: number of mappers: 4; number of reducers: 1
2018-08-13 11:16:23,048 Stage-4 map = 0%,  reduce = 0%
2018-08-13 11:16:44,240 Stage-4 map = 25%,  reduce = 0%, Cumulative CPU 2.28 sec
2018-08-13 11:16:46,330 Stage-4 map = 50%,  reduce = 0%, Cumulative CPU 5.06 sec
2018-08-13 11:16:49,473 Stage-4 map = 75%,  reduce = 0%, Cumulative CPU 9.58 sec
2018-08-13 11:16:50,520 Stage-4 map = 100%,  reduce = 0%, Cumulative CPU 15.14 sec
2018-08-13 11:17:12,471 Stage-4 map = 0%,  reduce = 0%
2018-08-13 11:17:42,680 Stage-4 map = 25%,  reduce = 0%, Cumulative CPU 2.2 sec
2018-08-13 11:17:44,779 Stage-4 map = 50%,  reduce = 0%, Cumulative CPU 5.25 sec
2018-08-13 11:17:46,873 Stage-4 map = 100%,  reduce = 0%, Cumulative CPU 15.0 sec
2018-08-13 11:18:12,006 Stage-4 map = 0%,  reduce = 0%
MapReduce Total cumulative CPU time: 15 seconds 0 msec
Ended Job = job_1534123434864_0536 with errors
Error during job, obtaining debugging information...
FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask
MapReduce Jobs Launched: 
Stage-Stage-1: Map: 140  Reduce: 557   Cumulative CPU: 3475.74 sec   HDFS Read: 37355213704 HDFS Write: 56143 SUCCESS
Stage-Stage-4: Map: 4  Reduce: 1   Cumulative CPU: 15.0 sec   HDFS Read: 0 HDFS Write: 0 FAIL
Total MapReduce CPU Time Spent: 58 minutes 10 seconds 740 msec
WARN: The method class org.apache.commons.logging.impl.SLF4JLogFactory#release() was invoked.
WARN: Please see http://www.slf4j.org/codes.html#release for an explanation.

I checked

yarn logs -applicationId application_1534123434864_0480

And there are some kinds of Errors in container_1534123434864_0480_02_000001

(1)ERROR [RMCommunicator Allocator] 
org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: 
Container complete event for unknown container container_1534123434864_0480_02_000143


(2)INFO [AsyncDispatcher event handler]
org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Diagnostics report from attempt_1534123434864_0480_r_000014_1000: 
Container killed on request. Exit code is 137
Container exited with a non-zero exit code 137
Killed by external signal

(3)INFO [AsyncDispatcher event handler]
org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl:
Diagnostics report from attempt_1534123434864_0480_r_000041_1000:
Container exited with a non-zero exit code 154

(4)ERROR [ContainerLauncher #1] 
org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: 
Container launch failed for container_1534123434864_0480_02_000241 : 
java.io.IOException: Failed on local exception: java.io.IOException: java.io.IOException: 
Connection reset from partner; Host Details : local host is: "node3"; destination host is: "node2":8041; 
    at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:772)
    at org.apache.hadoop.ipc.Client.call(Client.java:1508)
    at org.apache.hadoop.ipc.Client.call(Client.java:1441)
    at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:230)
    at com.sun.proxy.$Proxy40.startContainers(Unknown Source)
    at org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagementProtocolPBClientImpl.startContainers(ContainerManagementProtocolPBClientImpl.java:96)
    at sun.reflect.GeneratedMethodAccessor5.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:258)
    at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:104)
    at com.sun.proxy.$Proxy41.startContainers(Unknown Source)
    at org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$Container.launch(ContainerLauncherImpl.java:155)
    at org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:379)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.IOException: java.io.IOException: Connection reset from partner
    at org.apache.hadoop.ipc.Client$Connection$1.run(Client.java:718)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:415)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1920)
    at org.apache.hadoop.ipc.Client$Connection.handleSaslConnectionFailure(Client.java:681)
    at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:769)
    at org.apache.hadoop.ipc.Client$Connection.access$3000(Client.java:396)
    at org.apache.hadoop.ipc.Client.getConnection(Client.java:1557)
    at org.apache.hadoop.ipc.Client.call(Client.java:1480)
    ... 15 more
Caused by: java.io.IOException: Connection reset from partner
    at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
    at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
    at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
    at sun.nio.ch.IOUtil.read(IOUtil.java:197)
    at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379)
    at org.apache.hadoop.net.SocketInputStream$Reader.performIO(SocketInputStream.java:57)
    at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142)
    at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161)
    at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:131)
    at java.io.FilterInputStream.read(FilterInputStream.java:133)
    at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
    at java.io.BufferedInputStream.read(BufferedInputStream.java:254)
    at java.io.DataInputStream.readInt(DataInputStream.java:387)
    at org.apache.hadoop.security.SaslRpcClient.saslConnect(SaslRpcClient.java:370)
    at org.apache.hadoop.ipc.Client$Connection.setupSaslConnection(Client.java:594)
    at org.apache.hadoop.ipc.Client$Connection.access$2000(Client.java:396)
    at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:761)
    at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:757)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:415)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1920)
    at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:756)
    ... 18 more

EricL · ‎08-23-2018

hmm, job job_1534123434864_0480 finished successfully, I think you should check the log for job job_1534123434864_0536 instead.

tbt0127 · ‎08-28-2018

I couldn't find any error in job job_1534123434864_0536. So I uninstall cloudera manager and reinstall, then it works well.

Thanks for helping.

Cloudera Community

Support Questions

Hive query stops with Error "Execution Error, return code 2 from org.apache.hadoop.hive.ql.exe"