About mike_bronson7

mike_bronson7 · ‎01-24-2018

grep -i war /var/log/hadoop/hdfs/hadoop-hdfs-journalnode-master03.sys573.com.log 2018-01-24 19:03:41,115 WARN timeline.HadoopTimelineMetricsSink (HadoopTimelineMetricsSink.java:putMetrics(349)) - Unable to send metrics to collector by address:http://master02.sys573.com:6188/ws/v1/timeline/metrics 2018-01-24 19:28:08,819 WARN mortbay.log (Slf4jLog.java:warn(76)) - Can't reuse /tmp/Jetty_0_0_0_0_8480_journal____.8g4awa, using /tmp/Jetty_0_0_0_0_8480_journal____.8g4awa_661614759039131704 2018-01-24 19:29:18,310 WARN timeline.HadoopTimelineMetricsSink (HadoopTimelineMetricsSink.java:putMetrics(349)) - Unable to send metrics to collector by address:http://null:6188/ws/v1/timeline/metrics 2018-01-24 19:30:28,393 WARN timeline.HadoopTimelineMetricsSink (HadoopTimelineMetricsSink.java:putMetrics(349)) - Unable to send metrics to collector by address:http://null:6188/ws/v1/timeline/metrics 2018-01-24 19:56:39,690 WARN timeline.HadoopTimelineMetricsSink (HadoopTimelineMetricsSink.java:putMetrics(349)) - Unable to send metrics to collector by address:http://null:6188/ws/v1/timeline/metrics 2018-01-24 19:59:38,233 WARN timeline.HadoopTimelineMetricsSink (HadoopTimelineMetricsSink.java:putMetrics(349)) - Unable to send metrics to collector by address:http://null:6188/ws/v1/timeline/metrics 2018-01-24 20:29:58,228 WARN timeline.HadoopTimelineMetricsSink (HadoopTimelineMetricsSink.java:putMetrics(349)) - Unable to send metrics to collector by address:http://null:6188/ws/v1/timeline/metrics 2018-01-24 20:37:29,599 WARN timeline.HadoopTimelineMetricsSink (HadoopTimelineMetricsSink.java:putMetrics(349)) - Unable to send metrics to collector by address:http://null:6188/ws/v1/timeline/metrics 2018-01-24 21:00:28,236 WARN timeline.HadoopTimelineMetricsSink (HadoopTimelineMetricsSink.java:putMetrics(349)) - Unable to send metrics to collector by address:http://null:6188/ws/v1/timeline/metrics 2018-01-24 21:03:05,331 WARN ipc.Server (Server.java:processResponse(1273)) - IPC Server handler 1 on 8485, call org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocol.format from :49400 Call#4 Retry#0: output error 2018-01-24 21:16:59,654 WARN timeline.HadoopTimelineMetricsSink (HadoopTimelineMetricsSink.java:putMetrics(349)) - Unable to send metrics to collector by address:http://null:6188/ws/v1/timeline/metrics 2018-01-24 21:21:08,278 WARN timeline.HadoopTimelineMetricsSink (HadoopTimelineMetricsSink.java:putMetrics(349)) - Unable to send metrics to collector by address:http://null:6188/ws/v1/timeline/metrics

mike_bronson7 · ‎01-24-2018

grep -i error /var/log/hadoop/hdfs/hadoop-hdfs-journalnode-master03.sys57.com.log 2018-01-24 19:05:38,295 ERROR server.JournalNode (LogAdapter.java:error(69)) - RECEIVED SIGNAL 15: SIGTERM 2018-01-24 19:53:09,054 ERROR server.JournalNode (LogAdapter.java:error(69)) - RECEIVED SIGNAL 15: SIGTERM 2018-01-24 21:03:05,331 WARN ipc.Server (Server.java:processResponse(1273)) - IPC Server handler 1 on 8485, call org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocol.format from :49400 Call#4 Retry#0: output erro

mike_bronson7 · ‎01-24-2018

@jay I run the hdfs namenode -bootstrapStandby on stand by but I get Retrying connect to server: master01.sys57.com/100.4.3.21:8020. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 MILLISECONDS) and that because both name node are down - I can start the name node on both machines

mike_bronson7 · ‎01-24-2018

@Jay I get on both nodes

mike_bronson7 · ‎01-24-2018

in our ambari cluster , we have a very strange problem we restart all servers - master01-03 and on each master server we start the services from beginning according to the right order first on all masters we start the zookeeper server then on all masters we start the JournalNode but we notice that on the last master machine - JournalNode restarting evry 10-20 seconds and on all other machines - JournalNode is stable please advice why this happend ?

mike_bronson7 · ‎01-24-2018

I am also tried with hadoop namenode -format on master03 machine but we got this: <code>Could not format one or more JournalNodes. 1 exceptions thrown: Directory /data/hadoop/hdfs/journal/hdfsha is in an inconsistent state: Can't format the storage directory because the current directory is not empty the complete log from ( hadoop namenode -format ) 18/01/24 17:36:19 ERROR namenode.NameNode: Failed to start namenode. org.apache.hadoop.hdfs.qjournal.client.QuorumException: Could not format one or more JournalNodes. 1 exceptions thrown: 8485: Directory /data/hadoop/hdfs/journal/hdfsha is in an inconsistent state: Can't format the storage directory because the current directory is not empty. at org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.checkEmptyCurrent(Storage.java:482) at org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.analyzeStorage(Storage.java:558) at org.apache.hadoop.hdfs.qjournal.server.JNStorage.format(JNStorage.java:185) at org.apache.hadoop.hdfs.qjournal.server.Journal.format(Journal.java:217) at org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.format(JournalNodeRpcServer.java:145) at org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.format(QJournalProtocolServerSideTranslatorPB.java:145) at org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:25419) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:640) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2351) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2347) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2345)

mike_bronson7 · ‎01-24-2018

we have a old ambari cluster machine version 2.6 from the logs ( under /var/log/hadoop/hdf ) , we can see the error - No valid image files found I am not sure about my solution , but is its mean that we need to delete the files - ( edits_inprogress_XXXXX ) under /hadoop/hdfs/journal/hdfsha/current , and then restart the standby name-node service ? 2018-01-24 16:10:27,826 ERROR namenode.NameNode (NameNode.java:main(1774)) - Failed to start namenode. java.io.FileNotFoundException: No valid image files found at org.apache.hadoop.hdfs.server.namenode.FSImageTransactionalStorageInspector.getLatestImages(FSImageTransactionalStorageInspector.java:165) at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:618) at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:289) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1045) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:703) at org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:688) at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:752) at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:992) at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:976) at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1701) at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1769) 2018-01-24 16:10:27,829 INFO util.ExitUtil (ExitUtil.java:terminate(124)) - Exiting with status 1 2018-01-24 16:10:27,845 INFO namenode.NameNode (LogAdapter.java:info(47)) - SHUTDOWN_MSG: /************************************************************ SHUTDOWN_MSG: Shutting down NameNode at master03.sys573.com/102.14.22.29 ************************************************************/

mike_bronson7 · ‎01-24-2018

when I run it alone we get - su hdfs -l -s /bin/bash -c 'ulimit -c unlimited ; /usr/hdp/current/hadoop-client/sbin/hadoop-daemon.sh --config /usr/hdp/current/hadoop-client/conf start namenode' starting namenode, logging to /var/log/hadoop/hdfs/hadoop-hdfs-namenode-master03.sys57.com.out echo $? 1

mike_bronson7 · ‎01-24-2018

when we start the ZKFailoverControl service , ZooKeeper Server will failed after few seconds for example from the ambari - ZooKeeper Server service is stable , but when we start the ZKFailoverControl service , then immeditly ZooKeeper Server will fail and also the ZKFailoverControl service itself please advice hat could be the root cause for that?

mike_bronson7 · ‎01-24-2018

we are trying to start the Standby NameNode on master03 machines but withou success from the error log we can see the follwing but we cant capture what is the problem , please advice what chuld be the reason that namenode not started according to the follwing log Traceback (most recent call last): File "/var/lib/ambari-agent/cache/common-services/HDFS/2.1.0.2.0/package/scripts/namenode.py", line 424, in <module> NameNode().execute() File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", line 314, in execute method(env) File "/var/lib/ambari-agent/cache/common-services/HDFS/2.1.0.2.0/package/scripts/namenode.py", line 100, in start upgrade_suspended=params.upgrade_suspended, env=env) File "/usr/lib/python2.6/site-packages/ambari_commons/os_family_impl.py", line 89, in thunk return fn(*args, **kwargs) File "/var/lib/ambari-agent/cache/common-services/HDFS/2.1.0.2.0/package/scripts/hdfs_namenode.py", line 167, in namenode create_log_dir=True File "/var/lib/ambari-agent/cache/common-services/HDFS/2.1.0.2.0/package/scripts/utils.py", line 271, in service Execute(daemon_cmd, not_if=process_id_exists_command, environment=hadoop_env_exports) File "/usr/lib/python2.6/site-packages/resource_management/core/base.py", line 155, in __init__ self.env.run() File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 160, in run self.run_action(resource, action) File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 124, in run_action provider_action() File "/usr/lib/python2.6/site-packages/resource_management/core/providers/system.py", line 262, in action_run tries=self.resource.tries, try_sleep=self.resource.try_sleep) File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 72, in inner result = function(command, **kwargs) File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 102, in checked_call tries=tries, try_sleep=try_sleep, timeout_kill_strategy=timeout_kill_strategy) File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 150, in _call_wrapper result = _call(command, **kwargs_copy) File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 303, in _call raise ExecutionFailed(err_msg, code, out, err) resource_management.core.exceptions.ExecutionFailed: Execution of 'ambari-sudo.sh su hdfs -l -s /bin/bash -c 'ulimit -c unlimited ; /usr/hdp/current/hadoop-client/sbin/hadoop-daemon.sh --config /usr/hdp/current/hadoop-client/conf start namenode'' returned 1. starting namenode, logging to /var/log/hadoop/hdfs/hadoop-hdfs-namenode-master03.sys57.com.out

Online	Offline
Last Visited	‎08-27-2024 09:17 AM

Member Since	‎08-08-2017 09:40 AM
Last Visited	‎08-27-2024 09:17 AM
Posts	1,652
Kudos received	29

Cloudera Community

Re: how to find number of CPU core on datanode ma...

Re: postgresql + ambari server failed to open port...

Re: how to stop the thrift servers by REST API

Re: namenode is in safe mode

Re: Directory /grid/sdg/hadoop/hdfs/data became un...

Re: JournalNode ( HDFS ) restarting all the time

Re: JournalNode ( HDFS ) restarting all the time

Re: ambari cluster + No valid image files found

Re: ambari cluster + No valid image files found

JournalNode ( HDFS ) restarting all the time

Re: ambari cluster + No valid image files found

ambari cluster + No valid image files found

Re: cant start Standby NameNode

ZooKeeper Server failed when we start the ZKFailov...

cant start Standby NameNode