Member since
08-08-2017
1652
Posts
30
Kudos Received
11
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 1964 | 06-15-2020 05:23 AM | |
| 16014 | 01-30-2020 08:04 PM | |
| 2107 | 07-07-2019 09:06 PM | |
| 8235 | 01-27-2018 10:17 PM | |
| 4663 | 12-31-2017 10:12 PM |
01-24-2018
09:25 PM
grep -i war /var/log/hadoop/hdfs/hadoop-hdfs-journalnode-master03.sys573.com.log
2018-01-24 19:03:41,115 WARN timeline.HadoopTimelineMetricsSink (HadoopTimelineMetricsSink.java:putMetrics(349)) - Unable to send metrics to collector by address:http://master02.sys573.com:6188/ws/v1/timeline/metrics
2018-01-24 19:28:08,819 WARN mortbay.log (Slf4jLog.java:warn(76)) - Can't reuse /tmp/Jetty_0_0_0_0_8480_journal____.8g4awa, using /tmp/Jetty_0_0_0_0_8480_journal____.8g4awa_661614759039131704
2018-01-24 19:29:18,310 WARN timeline.HadoopTimelineMetricsSink (HadoopTimelineMetricsSink.java:putMetrics(349)) - Unable to send metrics to collector by address:http://null:6188/ws/v1/timeline/metrics
2018-01-24 19:30:28,393 WARN timeline.HadoopTimelineMetricsSink (HadoopTimelineMetricsSink.java:putMetrics(349)) - Unable to send metrics to collector by address:http://null:6188/ws/v1/timeline/metrics
2018-01-24 19:56:39,690 WARN timeline.HadoopTimelineMetricsSink (HadoopTimelineMetricsSink.java:putMetrics(349)) - Unable to send metrics to collector by address:http://null:6188/ws/v1/timeline/metrics
2018-01-24 19:59:38,233 WARN timeline.HadoopTimelineMetricsSink (HadoopTimelineMetricsSink.java:putMetrics(349)) - Unable to send metrics to collector by address:http://null:6188/ws/v1/timeline/metrics
2018-01-24 20:29:58,228 WARN timeline.HadoopTimelineMetricsSink (HadoopTimelineMetricsSink.java:putMetrics(349)) - Unable to send metrics to collector by address:http://null:6188/ws/v1/timeline/metrics
2018-01-24 20:37:29,599 WARN timeline.HadoopTimelineMetricsSink (HadoopTimelineMetricsSink.java:putMetrics(349)) - Unable to send metrics to collector by address:http://null:6188/ws/v1/timeline/metrics
2018-01-24 21:00:28,236 WARN timeline.HadoopTimelineMetricsSink (HadoopTimelineMetricsSink.java:putMetrics(349)) - Unable to send metrics to collector by address:http://null:6188/ws/v1/timeline/metrics
2018-01-24 21:03:05,331 WARN ipc.Server (Server.java:processResponse(1273)) - IPC Server handler 1 on 8485, call org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocol.format from :49400 Call#4 Retry#0: output error
2018-01-24 21:16:59,654 WARN timeline.HadoopTimelineMetricsSink (HadoopTimelineMetricsSink.java:putMetrics(349)) - Unable to send metrics to collector by address:http://null:6188/ws/v1/timeline/metrics
2018-01-24 21:21:08,278 WARN timeline.HadoopTimelineMetricsSink (HadoopTimelineMetricsSink.java:putMetrics(349)) - Unable to send metrics to collector by address:http://null:6188/ws/v1/timeline/metrics
... View more
01-24-2018
09:22 PM
grep -i error /var/log/hadoop/hdfs/hadoop-hdfs-journalnode-master03.sys57.com.log
2018-01-24 19:05:38,295 ERROR server.JournalNode (LogAdapter.java:error(69)) - RECEIVED SIGNAL 15: SIGTERM
2018-01-24 19:53:09,054 ERROR server.JournalNode (LogAdapter.java:error(69)) - RECEIVED SIGNAL 15: SIGTERM
2018-01-24 21:03:05,331 WARN ipc.Server (Server.java:processResponse(1273)) - IPC Server handler 1 on 8485, call org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocol.format from :49400 Call#4 Retry#0: output erro
... View more
01-24-2018
09:17 PM
@jay I run the hdfs namenode -bootstrapStandby on stand by but I get Retrying connect to server: master01.sys57.com/100.4.3.21:8020. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 MILLISECONDS) and that because both name node are down - I can start the name node on both machines
... View more
01-24-2018
09:13 PM
@Jay I get on both nodes
... View more
01-24-2018
08:06 PM
in our ambari cluster , we have a very strange problem we restart all servers - master01-03 and on each master server we start the services from beginning according to the right order first on all masters we start the zookeeper server then on all masters we start the JournalNode but we notice that on the last master machine - JournalNode restarting evry 10-20 seconds and on all other machines - JournalNode is stable please advice why this happend ?
... View more
Labels:
- Labels:
-
Apache Ambari
-
Apache Hadoop
01-24-2018
05:48 PM
I am also tried with hadoop namenode -format on master03 machine but we got this: <code>Could not format one or more JournalNodes. 1 exceptions thrown:
Directory /data/hadoop/hdfs/journal/hdfsha is in an inconsistent state: Can't format the storage directory because the current directory is not empty
the complete log from ( hadoop namenode -format )
18/01/24 17:36:19 ERROR namenode.NameNode: Failed to start namenode.
org.apache.hadoop.hdfs.qjournal.client.QuorumException: Could not format one or more JournalNodes. 1 exceptions thrown:
8485: Directory /data/hadoop/hdfs/journal/hdfsha is in an inconsistent state: Can't format the storage directory because the current directory is not empty.
at org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.checkEmptyCurrent(Storage.java:482)
at org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.analyzeStorage(Storage.java:558)
at org.apache.hadoop.hdfs.qjournal.server.JNStorage.format(JNStorage.java:185)
at org.apache.hadoop.hdfs.qjournal.server.Journal.format(Journal.java:217)
at org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.format(JournalNodeRpcServer.java:145)
at org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.format(QJournalProtocolServerSideTranslatorPB.java:145)
at org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:25419)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:640)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2351)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2347)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2345)
... View more
01-24-2018
04:31 PM
we have a old ambari cluster machine version 2.6 from the logs ( under /var/log/hadoop/hdf ) , we can see the error - No valid image files found I am not sure about my solution , but is its mean that we need to delete the files - ( edits_inprogress_XXXXX ) under /hadoop/hdfs/journal/hdfsha/current , and then restart the standby name-node service ? 2018-01-24 16:10:27,826 ERROR namenode.NameNode (NameNode.java:main(1774)) - Failed to start namenode.
java.io.FileNotFoundException: No valid image files found
at org.apache.hadoop.hdfs.server.namenode.FSImageTransactionalStorageInspector.getLatestImages(FSImageTransactionalStorageInspector.java:165)
at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:618)
at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:289)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1045)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:703)
at org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:688)
at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:752)
at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:992)
at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:976)
at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1701)
at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1769)
2018-01-24 16:10:27,829 INFO util.ExitUtil (ExitUtil.java:terminate(124)) - Exiting with status 1
2018-01-24 16:10:27,845 INFO namenode.NameNode (LogAdapter.java:info(47)) - SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at master03.sys573.com/102.14.22.29
************************************************************/
... View more
Labels:
- Labels:
-
Apache Ambari
-
Apache Hadoop
01-24-2018
02:38 PM
when I run it alone we get - su hdfs -l -s /bin/bash -c 'ulimit -c unlimited ; /usr/hdp/current/hadoop-client/sbin/hadoop-daemon.sh --config /usr/hdp/current/hadoop-client/conf start namenode'
starting namenode, logging to /var/log/hadoop/hdfs/hadoop-hdfs-namenode-master03.sys57.com.out
echo $?
1
... View more
01-24-2018
02:09 PM
when we start the ZKFailoverControl service , ZooKeeper Server will failed after few seconds for example from the ambari - ZooKeeper Server service is stable , but when we start the ZKFailoverControl service , then immeditly ZooKeeper Server will fail and also the ZKFailoverControl service itself please advice hat could be the root cause for that?
... View more
Labels:
- Labels:
-
Apache Ambari
-
Apache Hadoop
01-24-2018
02:04 PM
we are trying to start the Standby NameNode on master03 machines but withou success from the error log we can see the follwing but we cant capture what is the problem , please advice what chuld be the reason that namenode not started according to the follwing log Traceback (most recent call last):
File "/var/lib/ambari-agent/cache/common-services/HDFS/2.1.0.2.0/package/scripts/namenode.py", line 424, in <module>
NameNode().execute()
File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", line 314, in execute
method(env)
File "/var/lib/ambari-agent/cache/common-services/HDFS/2.1.0.2.0/package/scripts/namenode.py", line 100, in start
upgrade_suspended=params.upgrade_suspended, env=env)
File "/usr/lib/python2.6/site-packages/ambari_commons/os_family_impl.py", line 89, in thunk
return fn(*args, **kwargs)
File "/var/lib/ambari-agent/cache/common-services/HDFS/2.1.0.2.0/package/scripts/hdfs_namenode.py", line 167, in namenode
create_log_dir=True
File "/var/lib/ambari-agent/cache/common-services/HDFS/2.1.0.2.0/package/scripts/utils.py", line 271, in service
Execute(daemon_cmd, not_if=process_id_exists_command, environment=hadoop_env_exports)
File "/usr/lib/python2.6/site-packages/resource_management/core/base.py", line 155, in __init__
self.env.run()
File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 160, in run
self.run_action(resource, action)
File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 124, in run_action
provider_action()
File "/usr/lib/python2.6/site-packages/resource_management/core/providers/system.py", line 262, in action_run
tries=self.resource.tries, try_sleep=self.resource.try_sleep)
File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 72, in inner
result = function(command, **kwargs)
File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 102, in checked_call
tries=tries, try_sleep=try_sleep, timeout_kill_strategy=timeout_kill_strategy)
File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 150, in _call_wrapper
result = _call(command, **kwargs_copy)
File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 303, in _call
raise ExecutionFailed(err_msg, code, out, err)
resource_management.core.exceptions.ExecutionFailed: Execution of 'ambari-sudo.sh su hdfs -l -s /bin/bash -c 'ulimit -c unlimited ; /usr/hdp/current/hadoop-client/sbin/hadoop-daemon.sh --config /usr/hdp/current/hadoop-client/conf start namenode'' returned 1. starting namenode, logging to /var/log/hadoop/hdfs/hadoop-hdfs-namenode-master03.sys57.com.out
... View more
Labels:
- Labels:
-
Apache Ambari
-
Apache Hadoop