Created on 08-08-2018 12:07 AM - edited 09-16-2022 06:34 AM
I'm using Hive(with Yarn) that is installed by CDH-5.14.2-1, and made a database which keeps purchase history. One table which has purchase history has 1,000,000,000 tuples.
I tried the following query to measure Hive's performance.
SELECT c.gender,
g.NAME,
i.NAME,
Sum(b.num)
FROM customers c
JOIN boughts_bil b
ON ( c.id = b.cus_id
AND b.id < $var )
JOIN items i
ON ( i.id = b.item_id )
JOIN genres g
ON ( g.id = i.gen_id )
GROUP BY c.gender,
g.NAME,
i.NAME;
Incidentally, since I want to try with no optimization, I made no partitions.
When I set "$var=30,000,000", the error "Execution Error, return code 2 from org.apache.hadoop.hive.ql.exe" has occurred. In reality, I use the same query and that time it worked fine.
Cloudera's plan was Express when it was going well, but now the plan became Enterprise-only. Is it cause?
Or are there different reasons for example out of memory error.
Please give your wisdom.
Thanks.
addition
I checked HistoryServer and write like below
Diagnostics:
Application failed due to failed ApplicationMaster.
Only partial information is available; some values may be inaccurate.
I'll check the table value.
Created 08-20-2018 11:49 PM
Created on 08-21-2018 12:04 AM - edited 08-21-2018 07:29 AM
Thanks for replying. And sorry that I miss clicked "accept as solution" .
I show a result of a run and a yarn log.
The run result is below
Query ID = ..._20180813111111_92d8a1f2-4614-49c6-8833-d7b2e709c79c Total jobs = 2 Stage-1 is selected by condition resolver. Launching Job 1 out of 2 Starting Job = job_1534123434864_0480, Tracking URL = http://...:8088/proxy/application_1534123434864_0480/ Kill Command = /.../hadoop job -kill job_1534123434864_0480 Hadoop job information for Stage-1: number of mappers: 140; number of reducers: 557 2018-08-13 11:11:49,795 Stage-1 map = 0%, reduce = 0%
... 2018-08-13 11:15:45,128 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 3475.74 sec MapReduce Total cumulative CPU time: 57 minutes 55 seconds 740 msec Ended Job = job_1534123434864_0480 Execution log at: /.../..._20180813111111_92d8a1f2-4614-49c6-8833-d7b2e709c79c.log 2018-08-13 11:15:51 Starting to launch local task to process map join; maximum memory = 1908932608 2018-08-13 11:15:52 Dump the side-table for tag: 1 with group count: 24 into file: file:/.../c33533aa-7637-4034-a3d1-2e8b857c2820/hive_2018-08-13_11-11-38_070_2752807246292956243-1/-local-10006/HashTable-Stage-4/MapJoin-mapfile01--.hashtable 2018-08-13 11:15:52 Uploaded 1 File to: file:/.../c33533aa-7637-4034-a3d1-2e8b857c2820/hive_2018-08-13_11-11-38_070_2752807246292956243-1/-local-10006/HashTable-Stage-4/MapJoin-mapfile01--.hashtable (902 bytes) 2018-08-13 11:15:52 Dump the side-table for tag: 1 with group count: 3500 into file: file:/.../c33533aa-7637-4034-a3d1-2e8b857c2820/hive_2018-08-13_11-11-38_070_2752807246292956243-1/-local-10006/HashTable-Stage-4/MapJoin-mapfile11--.hashtable 2018-08-13 11:15:52 Uploaded 1 File to: file:/.../c33533aa-7637-4034-a3d1-2e8b857c2820/hive_2018-08-13_11-11-38_070_2752807246292956243-1/-local-10006/HashTable-Stage-4/MapJoin-mapfile11--.hashtable (107794 bytes) 2018-08-13 11:15:52 End of local task; Time Taken: 1.54 sec. Execution completed successfully MapredLocal task succeeded Launching Job 2 out of 2 Number of reduce tasks not specified. Estimated from input data size: 1 Starting Job = job_1534123434864_0536, Tracking URL = http://...:8088/proxy/application_1534123434864_0536/ Kill Command = /.../hadoop job -kill job_1534123434864_0536 Hadoop job information for Stage-4: number of mappers: 4; number of reducers: 1 2018-08-13 11:16:23,048 Stage-4 map = 0%, reduce = 0% 2018-08-13 11:16:44,240 Stage-4 map = 25%, reduce = 0%, Cumulative CPU 2.28 sec 2018-08-13 11:16:46,330 Stage-4 map = 50%, reduce = 0%, Cumulative CPU 5.06 sec 2018-08-13 11:16:49,473 Stage-4 map = 75%, reduce = 0%, Cumulative CPU 9.58 sec 2018-08-13 11:16:50,520 Stage-4 map = 100%, reduce = 0%, Cumulative CPU 15.14 sec 2018-08-13 11:17:12,471 Stage-4 map = 0%, reduce = 0% 2018-08-13 11:17:42,680 Stage-4 map = 25%, reduce = 0%, Cumulative CPU 2.2 sec 2018-08-13 11:17:44,779 Stage-4 map = 50%, reduce = 0%, Cumulative CPU 5.25 sec 2018-08-13 11:17:46,873 Stage-4 map = 100%, reduce = 0%, Cumulative CPU 15.0 sec 2018-08-13 11:18:12,006 Stage-4 map = 0%, reduce = 0% MapReduce Total cumulative CPU time: 15 seconds 0 msec Ended Job = job_1534123434864_0536 with errors Error during job, obtaining debugging information... FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask MapReduce Jobs Launched: Stage-Stage-1: Map: 140 Reduce: 557 Cumulative CPU: 3475.74 sec HDFS Read: 37355213704 HDFS Write: 56143 SUCCESS Stage-Stage-4: Map: 4 Reduce: 1 Cumulative CPU: 15.0 sec HDFS Read: 0 HDFS Write: 0 FAIL Total MapReduce CPU Time Spent: 58 minutes 10 seconds 740 msec WARN: The method class org.apache.commons.logging.impl.SLF4JLogFactory#release() was invoked. WARN: Please see http://www.slf4j.org/codes.html#release for an explanation.
I checked
yarn logs -applicationId application_1534123434864_0480
And there are some kinds of Errors in container_1534123434864_0480_02_000001
(1)ERROR [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Container complete event for unknown container container_1534123434864_0480_02_000143 (2)INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Diagnostics report from attempt_1534123434864_0480_r_000014_1000: Container killed on request. Exit code is 137 Container exited with a non-zero exit code 137 Killed by external signal (3)INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Diagnostics report from attempt_1534123434864_0480_r_000041_1000: Container exited with a non-zero exit code 154 (4)ERROR [ContainerLauncher #1] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Container launch failed for container_1534123434864_0480_02_000241 : java.io.IOException: Failed on local exception: java.io.IOException: java.io.IOException: Connection reset from partner; Host Details : local host is: "node3"; destination host is: "node2":8041; at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:772) at org.apache.hadoop.ipc.Client.call(Client.java:1508) at org.apache.hadoop.ipc.Client.call(Client.java:1441) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:230) at com.sun.proxy.$Proxy40.startContainers(Unknown Source) at org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagementProtocolPBClientImpl.startContainers(ContainerManagementProtocolPBClientImpl.java:96) at sun.reflect.GeneratedMethodAccessor5.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:258) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:104) at com.sun.proxy.$Proxy41.startContainers(Unknown Source) at org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$Container.launch(ContainerLauncherImpl.java:155) at org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:379) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Caused by: java.io.IOException: java.io.IOException: Connection reset from partner at org.apache.hadoop.ipc.Client$Connection$1.run(Client.java:718) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1920) at org.apache.hadoop.ipc.Client$Connection.handleSaslConnectionFailure(Client.java:681) at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:769) at org.apache.hadoop.ipc.Client$Connection.access$3000(Client.java:396) at org.apache.hadoop.ipc.Client.getConnection(Client.java:1557) at org.apache.hadoop.ipc.Client.call(Client.java:1480) ... 15 more Caused by: java.io.IOException: Connection reset from partner at sun.nio.ch.FileDispatcherImpl.read0(Native Method) at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39) at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223) at sun.nio.ch.IOUtil.read(IOUtil.java:197) at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379) at org.apache.hadoop.net.SocketInputStream$Reader.performIO(SocketInputStream.java:57) at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142) at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161) at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:131) at java.io.FilterInputStream.read(FilterInputStream.java:133) at java.io.BufferedInputStream.fill(BufferedInputStream.java:235) at java.io.BufferedInputStream.read(BufferedInputStream.java:254) at java.io.DataInputStream.readInt(DataInputStream.java:387) at org.apache.hadoop.security.SaslRpcClient.saslConnect(SaslRpcClient.java:370) at org.apache.hadoop.ipc.Client$Connection.setupSaslConnection(Client.java:594) at org.apache.hadoop.ipc.Client$Connection.access$2000(Client.java:396) at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:761) at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:757) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1920) at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:756) ... 18 more
Created 08-23-2018 05:44 PM
Created 08-28-2018 10:30 PM
I couldn't find any error in job job_1534123434864_0536. So I uninstall cloudera manager and reinstall, then it works well.
Thanks for helping.