Member since
09-29-2014
224
Posts
11
Kudos Received
10
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
729 | 01-24-2024 10:45 PM | |
3696 | 03-30-2022 08:56 PM | |
2952 | 08-12-2021 10:40 AM | |
7114 | 04-28-2021 01:30 AM | |
3575 | 09-27-2016 08:16 PM |
08-02-2021
02:30 PM
i have found my one of CDH has so many errors on every datanode, the error logs as below. who have this kind experience on this issue ? and give me some advises
2021-08-03 05:23:43,389 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving BP-2123011416-10.37.54.12-1457006347704:blk_3910061604_2849065475 src: /10.37.54.218:36088 dest: /10.37.54.218:1004
2021-08-03 05:23:43,700 INFO org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: /10.37.54.218:36082, dest: /10.37.54.218:1004, bytes: 358, op: HDFS_WRITE, cliID: DFSClient_NONMAPREDUCE_-859199005_222, offset: 0, srvID: 44713da0-9f69-44ea-b6c0-8f7420a41f83, blockid: BP-2123011416-10.37.54.12-1457006347704:blk_3910061597_2849065468, duration: 59733778
2021-08-03 05:23:43,700 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder: BP-2123011416-10.37.54.12-1457006347704:blk_3910061597_2849065468, type=HAS_DOWNSTREAM_IN_PIPELINE terminating
2021-08-03 05:23:43,833 INFO org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: /10.37.54.218:36088, dest: /10.37.54.218:1004, bytes: 309, op: HDFS_WRITE, cliID: DFSClient_NONMAPREDUCE_-859199005_222, offset: 0, srvID: 44713da0-9f69-44ea-b6c0-8f7420a41f83, blockid: BP-2123011416-10.37.54.12-1457006347704:blk_3910061604_2849065475, duration: 200220559
2021-08-03 05:23:43,833 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder: BP-2123011416-10.37.54.12-1457006347704:blk_3910061604_2849065475, type=HAS_DOWNSTREAM_IN_PIPELINE terminating
2021-08-03 05:23:44,044 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving BP-2123011416-10.37.54.12-1457006347704:blk_3910061619_2849065490 src: /10.37.54.15:59320 dest: /10.37.54.218:1004
2021-08-03 05:23:44,058 INFO org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: /10.37.54.15:59320, dest: /10.37.54.218:1004, bytes: 112, op: HDFS_WRITE, cliID: DFSClient_NONMAPREDUCE_1165227557_139, offset: 0, srvID: 44713da0-9f69-44ea-b6c0-8f7420a41f83, blockid: BP-2123011416-10.37.54.12-1457006347704:blk_3910061619_2849065490, duration: 3752037
2021-08-03 05:23:44,058 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder: BP-2123011416-10.37.54.12-1457006347704:blk_3910061619_2849065490, type=HAS_DOWNSTREAM_IN_PIPELINE terminating
2021-08-03 05:23:45,037 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving BP-2123011416-10.37.54.12-1457006347704:blk_3910061679_2849065550 src: /10.37.54.218:36108 dest: /10.37.54.218:1004
2021-08-03 05:23:45,185 INFO org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: /10.37.54.218:36108, dest: /10.37.54.218:1004, bytes: 1415899, op: HDFS_WRITE, cliID: DFSClient_NONMAPREDUCE_-1849481388_3452, offset: 0, srvID: 44713da0-9f69-44ea-b6c0-8f7420a41f83, blockid: BP-2123011416-10.37.54.12-1457006347704:blk_3910061679_2849065550, duration: 61038196
2021-08-03 05:23:45,185 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder: BP-2123011416-10.37.54.12-1457006347704:blk_3910061679_2849065550, type=HAS_DOWNSTREAM_IN_PIPELINE terminating
2021-08-03 05:23:45,497 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Moved BP-2123011416-10.37.54.12-1457006347704:blk_3802213701_2741214333 from /10.37.54.13:44312, delHint=6a0ea409-35ad-42c5-956d-44a5b9bd58a6
2021-08-03 05:23:45,703 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving BP-2123011416-10.37.54.12-1457006347704:blk_3910061646_2849065517 src: /10.37.54.216:54728 dest: /10.37.54.218:1004
2021-08-03 05:23:45,714 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Received BP-2123011416-10.37.54.12-1457006347704:blk_3910061646_2849065517 src: /10.37.54.216:54728 dest: /10.37.54.218:1004 of size 4786053
2021-08-03 05:23:45,998 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Moved BP-2123011416-10.37.54.12-1457006347704:blk_1842563008_775812314 from /10.37.54.13:50434, delHint=6a0ea409-35ad-42c5-956d-44a5b9bd58a6
2021-08-03 05:23:46,042 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: BlockSender.sendChunks() exception:
java.io.IOException: 断开的管道
at sun.nio.ch.FileChannelImpl.transferTo0(Native Method)
at sun.nio.ch.FileChannelImpl.transferToDirectlyInternal(FileChannelImpl.java:428)
at sun.nio.ch.FileChannelImpl.transferToDirectly(FileChannelImpl.java:493)
at sun.nio.ch.FileChannelImpl.transferTo(FileChannelImpl.java:608)
at org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:223)
at org.apache.hadoop.hdfs.server.datanode.BlockSender.sendPacket(BlockSender.java:605)
at org.apache.hadoop.hdfs.server.datanode.BlockSender.doSendBlock(BlockSender.java:789)
at org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:736)
at org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:551)
at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:148)
at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:103)
at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:246)
at java.lang.Thread.run(Thread.java:745)
2021-08-03 05:23:46,043 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: BlockSender.sendChunks() exception:
java.io.IOException: 断开的管道
at sun.nio.ch.FileChannelImpl.transferTo0(Native Method)
at sun.nio.ch.FileChannelImpl.transferToDirectlyInternal(FileChannelImpl.java:428)
at sun.nio.ch.FileChannelImpl.transferToDirectly(FileChannelImpl.java:493)
at sun.nio.ch.FileChannelImpl.transferTo(FileChannelImpl.java:608)
at org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:223)
at org.apache.hadoop.hdfs.server.datanode.BlockSender.sendPacket(BlockSender.java:605)
at org.apache.hadoop.hdfs.server.datanode.BlockSender.doSendBlock(BlockSender.java:789)
at org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:736)
at org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:551)
at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:148)
at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:103)
at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:246)
at java.lang.Thread.run(Thread.java:745)
2021-08-03 05:23:47,003 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving BP-2123011416-10.37.54.12-1457006347704:blk_3910061723_2849065594 src: /10.37.54.216:54770 dest: /10.37.54.218:1004
2021-08-03 05:23:47,018 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving BP-2123011416-10.37.54.12-1457006347704:blk_3910061724_2849065595 src: /10.37.54.216:54772 dest: /10.37.54.218:1004
2021-08-03 05:23:47,019 INFO org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: /10.37.54.216:54772, dest: /10.37.54.218:1004, bytes: 4158, op: HDFS_WRITE, cliID: DFSClient_NONMAPREDUCE_-1438538333_1, offset: 0, srvID: 44713da0-9f69-44ea-b6c0-8f7420a41f83, blockid: BP-2123011416-10.37.54.12-1457006347704:blk_3910061724_2849065595, duration: 1392081
2021-08-03 05:23:47,019 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder: BP-2123011416-10.37.54.12-1457006347704:blk_3910061724_2849065595, type=LAST_IN_PIPELINE, downstreams=0:[] terminating
2021-08-03 05:23:47,048 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving BP-2123011416-10.37.54.12-1457006347704:blk_3910061725_2849065596 src: /10.37.54.216:54774 dest: /10.37.54.218:1004
2021-08-03 05:23:47,056 INFO org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: /10.37.54.216:54774, dest: /10.37.54.218:1004, bytes: 69, op: HDFS_WRITE, cliID: DFSClient_NONMAPREDUCE_1452909160_189, offset: 0, srvID: 44713da0-9f69-44ea-b6c0-8f7420a41f83, blockid: BP-2123011416-10.37.54.12-1457006347704:blk_3910061725_2849065596, duration: 7712861
2021-08-03 05:23:47,056 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder: BP-2123011416-10.37.54.12-1457006347704:blk_3910061725_2849065596, type=LAST_IN_PIPELINE, downstreams=0:[] terminating
2021-08-03 05:23:47,371 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving BP-2123011416-10.37.54.12-1457006347704:blk_3910061731_2849065602 src: /10.37.54.218:36198 dest: /10.37.54.218:1004
2021-08-03 05:23:47,407 INFO org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: /10.37.54.218:36198, dest: /10.37.54.218:1004, bytes: 314, op: HDFS_WRITE, cliID: DFSClient_NONMAPREDUCE_466653976_222, offset: 0, srvID: 44713da0-9f69-44ea-b6c0-8f7420a41f83, blockid: BP-2123011416-10.37.54.12-1457006347704:blk_3910061731_2849065602, duration: 11069615
2021-08-03 05:23:47,407 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder: BP-2123011416-10.37.54.12-1457006347704:blk_3910061731_2849065602, type=HAS_DOWNSTREAM_IN_PIPELINE terminating
2021-08-03 05:23:47,422 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving BP-2123011416-10.37.54.12-1457006347704:blk_3910061732_2849065603 src: /10.37.54.218:36202 dest: /10.37.54.218:1004
2021-08-03 05:23:47,458 INFO org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: /10.37.54.218:36202, dest: /10.37.54.218:1004, bytes: 17456, op: HDFS_WRITE, cliID: DFSClient_NONMAPREDUCE_466653976_222, offset: 0, srvID: 44713da0-9f69-44ea-b6c0-8f7420a41f83, blockid: BP-2123011416-10.37.54.12-1457006347704:blk_3910061732_2849065603, duration: 9623611
2021-08-03 05:23:47,458 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder: BP-2123011416-10.37.54.12-1457006347704:blk_3910061732_2849065603, type=HAS_DOWNSTREAM_IN_PIPELINE terminating
2021-08-03 05:23:47,497 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Moved BP-2123011416-10.37.54.12-1457006347704:blk_2529434549_1466543157 from /10.37.54.13:39396, delHint=6a0ea409-35ad-42c5-956d-44a5b9bd58a6
... View more
Labels:
05-05-2021
07:52 PM
in my pervious experience, i haven't ever set the port range, he default port is 32768 ---65536. so my only question is why 1000~ port can't be connected ? could you give me some information?
... View more
04-28-2021
01:30 AM
this issue has been solved right now. and the investigation road likes below: when i got this issue from development team, these peoples told me some tasks will be failed, and asked me how to solve it. then i open Yarn web ui to check what's exact errors of this issue, and found the connection time out. this is the first vision i have got. so i was considering why the port can't connect ? maybe there is a firewall ? or maybe one machine got some problem, when task assigned to this machine, then this issue happended? these all are my assumption, and after two days checked, the answer is no. since no firewall, and this issue happended randomly on every machine. just yesterday night, i found if the connection port is near 1000, then the job failed and connection timeout, but if the port is near 30000+, there are no any issue happend. so i am going to check the sysctl.conf, i found the setting for port range is "net.ipv4.ip_local_port_range = 1024 65000", at last i set the port range between "32678. 655000", this issue has been solved.
... View more
04-27-2021
03:02 PM
the port: 1983 is application master's port or not ? i am not sure about that
... View more
04-27-2021
03:01 PM
Log Type: syslog
Log Upload Time: Wed Apr 28 03:31:04 +0800 2021
Log Length: 132219
2021-04-28 03:27:36,319 INFO [main] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Created MRAppMaster for application appattempt_1618548626214_128739_000001
2021-04-28 03:27:36,530 INFO [main] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Executing with tokens:
2021-04-28 03:27:36,530 INFO [main] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Kind: YARN_AM_RM_TOKEN, Service: , Ident: (org.apache.hadoop.yarn.security.AMRMTokenIdentifier@5b218417)
2021-04-28 03:27:36,706 INFO [main] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: OutputCommitter set in config org.apache.hadoop.hive.ql.io.HiveFileFormatUtils$NullOutputCommitter
2021-04-28 03:27:36,708 INFO [main] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: OutputCommitter is org.apache.hadoop.hive.ql.io.HiveFileFormatUtils$NullOutputCommitter
2021-04-28 03:27:37,232 WARN [main] org.apache.hadoop.util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2021-04-28 03:27:37,378 INFO [main] org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class org.apache.hadoop.mapreduce.jobhistory.EventType for class org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler
2021-04-28 03:27:37,379 INFO [main] org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class org.apache.hadoop.mapreduce.v2.app.job.event.JobEventType for class org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher
2021-04-28 03:27:37,380 INFO [main] org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class org.apache.hadoop.mapreduce.v2.app.job.event.TaskEventType for class org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskEventDispatcher
2021-04-28 03:27:37,381 INFO [main] org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class org.apache.hadoop.mapreduce.v2.app.job.event.TaskAttemptEventType for class org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskAttemptEventDispatcher
2021-04-28 03:27:37,381 INFO [main] org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventType for class org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler
2021-04-28 03:27:37,385 INFO [main] org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class org.apache.hadoop.mapreduce.v2.app.speculate.Speculator$EventType for class org.apache.hadoop.mapreduce.v2.app.MRAppMaster$SpeculatorEventDispatcher
2021-04-28 03:27:37,386 INFO [main] org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class org.apache.hadoop.mapreduce.v2.app.rm.ContainerAllocator$EventType for class org.apache.hadoop.mapreduce.v2.app.MRAppMaster$ContainerAllocatorRouter
2021-04-28 03:27:37,386 INFO [main] org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncher$EventType for class org.apache.hadoop.mapreduce.v2.app.MRAppMaster$ContainerLauncherRouter
2021-04-28 03:27:37,430 INFO [main] org.apache.hadoop.mapreduce.v2.jobhistory.JobHistoryUtils: Default file system [hdfs://nameservice1:8020]
2021-04-28 03:27:37,449 INFO [main] org.apache.hadoop.mapreduce.v2.jobhistory.JobHistoryUtils: Default file system [hdfs://nameservice1:8020]
2021-04-28 03:27:37,469 INFO [main] org.apache.hadoop.mapreduce.v2.jobhistory.JobHistoryUtils: Default file system [hdfs://nameservice1:8020]
2021-04-28 03:27:37,481 INFO [main] org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: Emitting job history data to the timeline server is not enabled
2021-04-28 03:27:37,513 INFO [main] org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class org.apache.hadoop.mapreduce.v2.app.job.event.JobFinishEvent$Type for class org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobFinishEventHandler
2021-04-28 03:27:37,673 INFO [main] org.apache.hadoop.metrics2.impl.MetricsConfig: loaded properties from hadoop-metrics2.properties
2021-04-28 03:27:37,724 INFO [main] org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled snapshot period at 10 second(s).
2021-04-28 03:27:37,725 INFO [main] org.apache.hadoop.metrics2.impl.MetricsSystemImpl: MRAppMaster metrics system started
2021-04-28 03:27:37,735 INFO [main] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Adding job token for job_1618548626214_128739 to jobTokenSecretManager
2021-04-28 03:27:37,855 INFO [main] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Not uberizing job_1618548626214_128739 because: not enabled; too much RAM;
2021-04-28 03:27:37,877 INFO [main] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Input size for job job_1618548626214_128739 = 23256534. Number of splits = 7
2021-04-28 03:27:37,877 INFO [main] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Number of reduces for job job_1618548626214_128739 = 0
2021-04-28 03:27:37,877 INFO [main] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: job_1618548626214_128739Job Transitioned from NEW to INITED
2021-04-28 03:27:37,878 INFO [main] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: MRAppMaster launching normal, non-uberized, multi-container job job_1618548626214_128739.
2021-04-28 03:27:37,905 INFO [main] org.apache.hadoop.ipc.CallQueueManager: Using callQueue: class java.util.concurrent.LinkedBlockingQueue queueCapacity: 100
2021-04-28 03:27:37,914 INFO [Socket Reader #1 for port 9115] org.apache.hadoop.ipc.Server: Starting Socket Reader #1 for port 9115
2021-04-28 03:27:37,951 INFO [main] org.apache.hadoop.yarn.factories.impl.pb.RpcServerFactoryPBImpl: Adding protocol org.apache.hadoop.mapreduce.v2.api.MRClientProtocolPB to the server
2021-04-28 03:27:37,952 INFO [IPC Server Responder] org.apache.hadoop.ipc.Server: IPC Server Responder: starting
2021-04-28 03:27:37,952 INFO [IPC Server listener on 9115] org.apache.hadoop.ipc.Server: IPC Server listener on 9115: starting
2021-04-28 03:27:37,953 INFO [main] org.apache.hadoop.mapreduce.v2.app.client.MRClientService: Instantiated MRClientService at dataware-14/10.39.58.19:9115
2021-04-28 03:27:38,009 INFO [main] org.mortbay.log: Logging to org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via org.mortbay.log.Slf4jLog
2021-04-28 03:27:38,015 INFO [main] org.apache.hadoop.security.authentication.server.AuthenticationFilter: Unable to initialize FileSignerSecretProvider, falling back to use random secrets.
2021-04-28 03:27:38,019 INFO [main] org.apache.hadoop.http.HttpRequestLog: Http request log for http.requests.mapreduce is not defined
2021-04-28 03:27:38,027 INFO [main] org.apache.hadoop.http.HttpServer2: Added global filter 'safety' (class=org.apache.hadoop.http.HttpServer2$QuotingInputFilter)
2021-04-28 03:27:38,072 INFO [main] org.apache.hadoop.http.HttpServer2: Added filter AM_PROXY_FILTER (class=org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter) to context mapreduce
2021-04-28 03:27:38,074 INFO [main] org.apache.hadoop.http.HttpServer2: Added filter AM_PROXY_FILTER (class=org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter) to context static
2021-04-28 03:27:38,077 INFO [main] org.apache.hadoop.http.HttpServer2: adding path spec: /mapreduce/*
2021-04-28 03:27:38,077 INFO [main] org.apache.hadoop.http.HttpServer2: adding path spec: /ws/*
2021-04-28 03:27:38,086 INFO [main] org.apache.hadoop.http.HttpServer2: Jetty bound to port 41305
2021-04-28 03:27:38,086 INFO [main] org.mortbay.log: jetty-6.1.26.cloudera.4
2021-04-28 03:27:38,120 INFO [main] org.mortbay.log: Extract jar:file:/opt/cloudera/parcels/CDH-5.13.3-1.cdh5.13.3.p0.2/jars/hadoop-yarn-common-2.6.0-cdh5.13.3.jar!/webapps/mapreduce to /tmp/Jetty_0_0_0_0_41305_mapreduce____2p8bem/webapp
2021-04-28 03:27:38,436 INFO [main] org.mortbay.log: Started HttpServer2$SelectChannelConnectorWithSafeStartup@0.0.0.0:41305
2021-04-28 03:27:38,437 INFO [main] org.apache.hadoop.yarn.webapp.WebApps: Web app /mapreduce started at 41305
2021-04-28 03:27:38,742 INFO [main] org.apache.hadoop.yarn.webapp.WebApps: Registered webapp guice modules
2021-04-28 03:27:38,745 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.speculate.DefaultSpeculator: JOB_CREATE job_1618548626214_128739
2021-04-28 03:27:38,748 INFO [main] org.apache.hadoop.ipc.CallQueueManager: Using callQueue: class java.util.concurrent.LinkedBlockingQueue queueCapacity: 3000
2021-04-28 03:27:38,749 INFO [Socket Reader #1 for port 1983] org.apache.hadoop.ipc.Server: Starting Socket Reader #1 for port 1983
2021-04-28 03:27:38,753 INFO [IPC Server Responder] org.apache.hadoop.ipc.Server: IPC Server Responder: starting
2021-04-28 03:27:38,753 INFO [IPC Server listener on 1983] org.apache.hadoop.ipc.Server: IPC Server listener on 1983: starting
2021-04-28 03:27:38,775 INFO [main] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerRequestor: nodeBlacklistingEnabled:true
2021-04-28 03:27:38,775 INFO [main] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerRequestor: maxTaskFailuresPerNode is 3
2021-04-28 03:27:38,775 INFO [main] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerRequestor: blacklistDisablePercent is 33
2021-04-28 03:27:38,848 INFO [main] org.apache.hadoop.yarn.client.ConfiguredRMFailoverProxyProvider: Failing over to rm237
2021-04-28 03:27:38,877 INFO [main] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: maxContainerCapability: <memory:24576, vCores:14>
2021-04-28 03:27:38,877 INFO [main] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: queue: root.etl_core
2021-04-28 03:27:38,881 INFO [main] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Upper limit on the thread pool size is 500
2021-04-28 03:27:38,881 INFO [main] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: The thread pool initial size is 10
2021-04-28 03:27:38,889 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: job_1618548626214_128739Job Transitioned from INITED to SETUP
2021-04-28 03:27:38,893 INFO [CommitterEvent Processor #0] org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler: Processing the event EventType: JOB_SETUP
2021-04-28 03:27:38,895 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: job_1618548626214_128739Job Transitioned from SETUP to RUNNING
2021-04-28 03:27:38,970 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl: task_1618548626214_128739_m_000000 Task Transitioned from NEW to SCHEDULED
2021-04-28 03:27:38,988 INFO [eventHandlingThread] org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: Event Writer setup for JobId: job_1618548626214_128739, File: hdfs://nameservice1:8020/user/hive/.staging/job_1618548626214_128739/job_1618548626214_128739_1.jhist
2021-04-28 03:27:38,994 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl: task_1618548626214_128739_m_000001 Task Transitioned from NEW to SCHEDULED
2021-04-28 03:27:39,013 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl: task_1618548626214_128739_m_000002 Task Transitioned from NEW to SCHEDULED
2021-04-28 03:27:39,032 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl: task_1618548626214_128739_m_000003 Task Transitioned from NEW to SCHEDULED
2021-04-28 03:27:39,049 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl: task_1618548626214_128739_m_000004 Task Transitioned from NEW to SCHEDULED
2021-04-28 03:27:39,077 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl: task_1618548626214_128739_m_000005 Task Transitioned from NEW to SCHEDULED
2021-04-28 03:27:39,095 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl: task_1618548626214_128739_m_000006 Task Transitioned from NEW to SCHEDULED
2021-04-28 03:27:39,097 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1618548626214_128739_m_000000_0 TaskAttempt Transitioned from NEW to UNASSIGNED
2021-04-28 03:27:39,097 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1618548626214_128739_m_000001_0 TaskAttempt Transitioned from NEW to UNASSIGNED
2021-04-28 03:27:39,098 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1618548626214_128739_m_000002_0 TaskAttempt Transitioned from NEW to UNASSIGNED
2021-04-28 03:27:39,098 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1618548626214_128739_m_000003_0 TaskAttempt Transitioned from NEW to UNASSIGNED
2021-04-28 03:27:39,098 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1618548626214_128739_m_000004_0 TaskAttempt Transitioned from NEW to UNASSIGNED
2021-04-28 03:27:39,098 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1618548626214_128739_m_000005_0 TaskAttempt Transitioned from NEW to UNASSIGNED
2021-04-28 03:27:39,098 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1618548626214_128739_m_000006_0 TaskAttempt Transitioned from NEW to UNASSIGNED
2021-04-28 03:27:39,099 INFO [Thread-53] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: mapResourceRequest:<memory:6144, vCores:1> please help me check the port:1983, everytime when the job failed, retry connection port is 1983, after several times then job failed since connection timeout. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2021-04-28 03:27:59,247 INFO [main] org.apache.hadoop.ipc.Client: Retrying connect to server: dataware-14/10.39.58.19:1983. Already tried 3 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2021-04-28 03:28:03,247 INFO [main] org.apache.hadoop.ipc.Client: Retrying connect to server: dataware-14/10.39.58.19:1983. Already tried 4 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2021-04-28 03:28:07,248 INFO [main] org.apache.hadoop.ipc.Client: Retrying connect to server: dataware-14/10.39.58.19:1983. Already tried 5 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2021-04-28 03:28:11,249 INFO [main] org.apache.hadoop.ipc.Client: Retrying connect to server: dataware-14/10.39.58.19:1983. Already tried 6 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2021-04-28 03:28:15,250 INFO [main] org.apache.hadoop.ipc.Client: Retrying connect to server: dataware-14/10.39.58.19:1983. Already tried 7 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2021-04-28 03:28:19,251 INFO [main] org.apache.hadoop.ipc.Client: Retrying connect to server: dataware-14/10.39.58.19:1983. Already tried 8 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2021-04-28 03:28:23,253 INFO [main] org.apache.hadoop.ipc.Client: Retrying connect to server: dataware-14/10.39.58.19:1983. Already tried 9 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2021-04-28 03:28:26,258 WARN [main] org.apache.hadoop.mapred.YarnChild: Exception running child : java.net.ConnectException: Call From dataware-17/10.39.58.15 to dataware-14:1983 failed on connection exception: java.net.ConnectException: Connection timed out; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:791)
at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:731)
at org.apache.hadoop.ipc.Client.call(Client.java:1508)
at org.apache.hadoop.ipc.Client.call(Client.java:1441)
at org.apache.hadoop.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:246)
at com.sun.proxy.$Proxy9.getTask(Unknown Source)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:132)
Caused by: java.net.ConnectException: Connection timed out
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:530)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:494)
at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:648)
at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:744)
at org.apache.hadoop.ipc.Client$Connection.access$3000(Client.java:396)
at org.apache.hadoop.ipc.Client.getConnection(Client.java:1557)
at org.apache.hadoop.ipc.Client.call(Client.java:1480)
... 4 more
... View more
04-25-2021
08:55 AM
you are right, the connection is between two Nodemanager, and i assume dataware-3 :1079 is app master, another one is a task. that's why i said the connection timeout is from task to appmaster. since this kind error just happend randomly, and 1 time per hour, so it's really hard for me to find out root cause.
... View more
04-25-2021
01:12 AM
Recently, MapReduce job sometimes failed, the details as below: after check map tasks , the log like below: Log Type: syslog Log Upload Time: Sun Apr 25 13:54:17 +0800 2021 Log Length: 5507 2021-04-25 13:51:01,806 INFO [main] org.apache.hadoop.metrics2.impl.MetricsConfig: loaded properties from hadoop-metrics2.properties
2021-04-25 13:51:01,893 INFO [main] org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled snapshot period at 10 second(s).
2021-04-25 13:51:01,893 INFO [main] org.apache.hadoop.metrics2.impl.MetricsSystemImpl: MapTask metrics system started
2021-04-25 13:51:01,895 INFO [main] org.apache.hadoop.mapred.YarnChild: Executing with tokens:
2021-04-25 13:51:01,895 INFO [main] org.apache.hadoop.mapred.YarnChild: Kind: mapreduce.job, Service: job_1618548626214_99981, Ident: (org.apache.hadoop.mapreduce.security.token.JobTokenIdentifier@732c2a62)
2021-04-25 13:51:02,182 INFO [main] org.apache.hadoop.mapred.YarnChild: Sleeping for 0ms before retrying again. Got null now.
2021-04-25 13:51:06,267 INFO [main] org.apache.hadoop.ipc.Client: Retrying connect to server: dataware-3/10.39.58.16:1079. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2021-04-25 13:51:10,268 INFO [main] org.apache.hadoop.ipc.Client: Retrying connect to server: dataware-3/10.39.58.16:1079. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2021-04-25 13:51:14,268 INFO [main] org.apache.hadoop.ipc.Client: Retrying connect to server: dataware-3/10.39.58.16:1079. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2021-04-25 13:51:18,268 INFO [main] org.apache.hadoop.ipc.Client: Retrying connect to server: dataware-3/10.39.58.16:1079. Already tried 3 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2021-04-25 13:51:22,269 INFO [main] org.apache.hadoop.ipc.Client: Retrying connect to server: dataware-3/10.39.58.16:1079. Already tried 4 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2021-04-25 13:51:26,270 INFO [main] org.apache.hadoop.ipc.Client: Retrying connect to server: dataware-3/10.39.58.16:1079. Already tried 5 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2021-04-25 13:51:30,271 INFO [main] org.apache.hadoop.ipc.Client: Retrying connect to server: dataware-3/10.39.58.16:1079. Already tried 6 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2021-04-25 13:51:34,272 INFO [main] org.apache.hadoop.ipc.Client: Retrying connect to server: dataware-3/10.39.58.16:1079. Already tried 7 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2021-04-25 13:51:38,272 INFO [main] org.apache.hadoop.ipc.Client: Retrying connect to server: dataware-3/10.39.58.16:1079. Already tried 8 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2021-04-25 13:51:42,272 INFO [main] org.apache.hadoop.ipc.Client: Retrying connect to server: dataware-3/10.39.58.16:1079. Already tried 9 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2021-04-25 13:51:45,274 WARN [main] org.apache.hadoop.mapred.YarnChild: Exception running child : java.net.ConnectException: Call From dataware-14/10.39.58.19 to dataware-3:1079 failed on connection exception: java.net.ConnectException: Connection timed out; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:791)
at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:731)
at org.apache.hadoop.ipc.Client.call(Client.java:1508)
at org.apache.hadoop.ipc.Client.call(Client.java:1441)
at org.apache.hadoop.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:246)
at com.sun.proxy.$Proxy9.getTask(Unknown Source)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:132)
Caused by: java.net.ConnectException: Connection timed out
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:530)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:494)
at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:648)
at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:744)
at org.apache.hadoop.ipc.Client$Connection.access$3000(Client.java:396)
at org.apache.hadoop.ipc.Client.getConnection(Client.java:1557)
at org.apache.hadoop.ipc.Client.call(Client.java:1480)
... 4 more
2021-04-25 13:51:45,275 INFO [main] org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Stopping MapTask metrics system...
2021-04-25 13:51:45,275 INFO [main] org.apache.hadoop.metrics2.impl.MetricsSystemImpl: MapTask metrics system stopped.
2021-04-25 13:51:45,275 INFO [main] org.apache.hadoop.metrics2.impl.MetricsSystemImpl: MapTask metrics system shutdown complete. from the above log, we can see the task connection timeout with App master, but this error happend randomly, who can give me some advises on this error. thanks.
... View more
Labels:
- Labels:
-
Apache Hadoop
04-15-2021
08:59 AM
Hi, everyone since my company is a group which has many Subsidiary company, so i am thinking about Cloudera Virtual private cluster solution is good for me or not . all the data save in the base cluster, and if there are new project or business need separate compute resource, then i just arrange a fews machines as a new compute cluster like hive or Flink . i mean this kind architecture is quite clear for me to calculate the resource. but i don't have any experiences on VPC, does anyone have this kind experience, could you share to me what's your experience or thinking ? thanks
... View more
Labels:
- Labels:
-
Cloudera Data Platform (CDP)
12-15-2019
05:39 AM
after monitor agent status more than ten days, I think this issue has been resolved by your solution. it seems this issue caused by impala logs, since I just have changed the impala log level to WARN, then it haven't happened again. thanks.
... View more
12-10-2019
01:20 AM
after change Impala log level to WARN, the agent connectivity issue happened frequency has been reduced. but still two servers happened, after stop agent and delete impala log, these the issue on these two servers hasn't happened again. I will continue monitor the agent issue, and feedback to you.
... View more