Created 07-26-2018 03:18 PM
I'm heaving issues with zookeeper being stopped a few seconds after startup. My problem though is that it seems the changes made to log4j properties through Ambari are not getting propagated. I'm trying to change rootLogger to DEBUG and change the log file name from default to zookeeper.log Here is the "Advanced zookeeper log4j":
# DEFAULT: console appender only
#log4j.rootLogger=INFO, CONSOLE, ROLLINGFILE
# Example with rolling log file
log4j.rootLogger=DEBUG, CONSOLE, ROLLINGFILE
# Example with rolling log file and tracing
#log4j.rootLogger=TRACE, CONSOLE, ROLLINGFILE, TRACEFILE
# # Log INFO level and above messages to the console
# log4j.appender.CONSOLE=org.apache.log4j.ConsoleAppender log4j.appender.CONSOLE.Threshold=INFO log4j.appender.CONSOLE.layout=org.apache.log4j.PatternLayout log4j.appender.CONSOLE.layout.ConversionPattern=%d{ISO8601} - %-5p [%t:%C{1}@%L] - %m%n # # Add ROLLINGFILE to rootLogger to get log file output # Log DEBUG level and above messages to a log file log4j.appender.ROLLINGFILE=org.apache.log4j.RollingFileAppender log4j.appender.ROLLINGFILE.Threshold=DEBUG log4j.appender.ROLLINGFILE.File={{zk_log_dir}}/zookeeper.log
# Max log file size of 10MB log4j.appender.ROLLINGFILE.MaxFileSize={{zookeeper_log_max_backup_size}}MB
# uncomment the next line to limit number of backup files
#log4j.appender.ROLLINGFILE.MaxBackupIndex={{zookeeper_log_number_of_backup_files}} log4j.appender.ROLLINGFILE.layout=org.apache.log4j.PatternLayout log4j.appender.ROLLINGFILE.layout.ConversionPattern=%d{ISO8601} - %-5p [%t:%C{1}@%L] - %m%n
# # Add TRACEFILE to rootLogger to get log file output # Log DEBUG level and above messages to a log file log4j.appender.TRACEFILE=org.apache.log4j.FileAppender log4j.appender.TRACEFILE.Threshold=TRACE log4j.appender.TRACEFILE.File=zookeeper_trace.log log4j.appender.TRACEFILE.layout=org.apache.log4j.PatternLayout
### Notice we are including log4j's NDC here (%x) log4j.appender.TRACEFILE.layout.ConversionPattern=%d{ISO8601} - %-5p [%t:%C{1}@%L][%x] - %m%n
But yet after restarting ps -ef | grpe zookeper shows the following:
zookeep+ 18598 1 4 23:49 ? 00:00:00 /usr/jdk64/jdk1.8.0_112/bin/java -Dzookeeper.log.dir=/xxx/zk/logs -Dzookeeper.log.file=zookeeper-zookeeper-server-xxxxxxxxxxx.log -Dzookeeper.root.logger=INFO,ROLLINGFILE .....
Note that neither DEBUG, nor new log file name (zookeeper.log) have not been propagated by ambari. Is there any property missing?
Thanks, Alex
Created 07-27-2018 06:18 PM
Hello Alex Goron
Hm indeed, that's quite strange.
Is there any "Custom zookeeper-log4j" set? Or any config groups different than default used by zk?
Ps: I'm attaching my zookeer-log4j.txt so you can compare with yours, or even replace it 🙂
Hope this helps!
Created 07-31-2018 03:42 AM
Thanks,
Alex
Created 07-31-2018 04:55 AM
Hi there @Alex Goron!
Hm, did you look at the zookeeper-env.sh? Maybe there's something extra set.
Btw, inside the zookeeper log4f file on the ZK hosts can you see the changes applied for DEBUG?
Created 07-31-2018 01:25 PM
Hi @Vinicius Higa Murakami , I found log4j in 2 places. Since I'm relatively new to HDP can you confirm which one is is being read during the startup? The log4j.properties which is under /etc/zookeeper/2.6.2.0-205/0/ did reflect the changes. The one which is under /usr/hdp/2.6.2.0-205/etc/zookeeper/conf.dist/log4j.properties did not and the properties names look a bit different (with '.' in the names)
As for zookeeper-env.sh (the one that is under /etc/zookeeper/2.6.2.0-205/0/) has my current directory for the log listed which is also in line with Ambari "Advanced zookeeper-env" field .
So assuming the config directory is /etc/zookeeper/2.6.2.0-205/0/, the good news is that it picks up changes from Ambari, however it does not look like they are being read.
Also I noticed that it adds new log entries only to one host (the one that does not fail after startup), the other 2 hosts that stays after startup for 30 secs or so and then fail do not have any new log entries even if I start them separately.
Thanks,
Alex
Created 07-31-2018 04:17 PM
Hello @Alex Goron!
Wow, that's weird 😞
Okay, so answering your question it should be under the /etc/zookeeper. What do you have on /usr/hdp/<VERSION>/zookeeper/ should point (symbolic link) to the /etc/zookeeper. Just like this:
[root@node3 kafka]# ls -ltrah /usr/hdf/current/zookeeper-server/ total 788K -rw-r--r-- 1 root root 782K May 23 15:45 zookeeper-3.4.6.3.1.2.0-7.jar drwxr-xr-x 2 root root 222 Jul 21 06:35 bin drwxr-xr-x 5 root root 42 Jul 21 06:35 doc drwxr-xr-x 3 root root 18 Jul 21 06:35 etc drwxr-xr-x 3 root root 18 Jul 21 06:35 man drwxr-xr-x 2 root root 4.0K Jul 21 06:35 lib lrwxrwxrwx 1 root root 29 Jul 21 06:35 zookeeper.jar -> zookeeper-3.4.6.3.1.2.0-7.jar lrwxrwxrwx 1 root root 26 Jul 21 06:35 conf -> /etc/zookeeper/3.1.2.0-7/0 drwxr-xr-x 7 root root 131 Jul 21 06:35 . drwxr-xr-x 8 root root 107 Jul 21 06:45 ..
Just asking but.. do you think it's possible to have any other services that might be installed on those 2 hosts (failing) that relies on ZK and it's apart from Ambari? E.g. Kafka, Atlas, HA of other components.. and so on. If so, usually this services comes with ZK embedded to launch the standalone/quickstart. Otherwise, we can check the logs from ambari-agent to see if shows up something.
Ps: one last thing, could you check if you have any env variable set on those 2 hosts that starts with ZOO? I'm also putting my zookeeper-env.sh below, to be compared with yours 🙂
export JAVA_HOME={{java64_home}} export ZOOKEEPER_HOME={{zk_home}} export ZOO_LOG_DIR={{zk_log_dir}} export ZOOPIDFILE={{zk_pid_file}} export SERVER_JVMFLAGS={{zk_server_heapsize}} export JAVA=$JAVA_HOME/bin/java export CLASSPATH=$CLASSPATH:/usr/share/zookeeper/* {% if security_enabled %} export SERVER_JVMFLAGS="$SERVER_JVMFLAGS -Djava.security.auth.login.config={{zk_server_jaas_file}}" export CLIENT_JVMFLAGS="$CLIENT_JVMFLAGS -Djava.security.auth.login.config={{zk_client_jaas_file}}" {% endif %}Hope this helps!
Created 07-31-2018 07:29 PM
Thanks @Vinicius Higa Murakami, the ZOO values are set in zookeeper-env.sh the same way as yours just with real paths and looks the same under the ambari config properties.
On one of the ZK failing node I have: Knox, Hmaster, Journal Node, Data Node, Node Manager, HDFS client and the stuff that I was forced to install by Ambari (Activity explorer, HST Server, Metrics Collector, Grafana, HST agent, Metrics Monitor)
On another ZK failing node: Journal Node, Zookeeper Failover Controller, Standby Name Node, Standby Resource Manager, Job History Server, Hive Server2, Hive Metastore, Data Node, Node Manager, Ambari Server, PostgreSQL, Spark/Spark2, Spark/Spark2 History server, Phoenix Query Server, dependencies (Metrics Monitor, HST Agent, App Timeline Server); clients: Hbase, HDFS, Hive, MR2, Pig, Slider, Spark, Tez, yarn, zookeeper
All the services are currently brought down for troubleshooting purposes. Though zookerper should startup clean first.
Thanks,
Alex
Created 07-31-2018 08:51 PM
Got it, and what about the ambari-agent log on those hosts? Is it showing something? Like Errors or Warnings?
And could you check the first lines of the zookeeper out and log?
head -1000 /var/log/zookeeper/zookeeper.log head -1000 /var/log/zookeeper/zookeeper-zookeeper-server-<ZK-HOST>.out
Created 08-01-2018 03:34 PM
zk-first-last-1000-log-out.tar.gz@Vinicius Higa Murakami
ambari-agent log mostly was complaining on node managers which is legit as they are down. I put all the services into a maintenance mode in hopes to suppress that 'legit' noise. I did not see anything suspicious yet and specifically for zookeeper either.
As for zookeeper.log and zookeeper.out in the failing nodes like I said earlier it has stopped writing new entries during zookeeper startup. The last entries observed in those files were of July 19th. Just in case I'm attaching first and last 1000 lines for log and out files from one of the failing nodes. But all in all not sure if that information may be relevant for now.
Thanks,
Alex
Created 08-01-2018 09:06 PM
Hi @Alex Goron!
Could check if your zk nodes have the right correlated ID?
Here's mine e.g.
[root@node3 ~]# cat /usr/hdf/current/zookeeper-server/conf/zoo.cfg # # # Licensed to the Apache Software Foundation (ASF) under one # or more contributor license agreements. See the NOTICE file # distributed with this work for additional information # regarding copyright ownership. The ASF licenses this file # to you under the Apache License, Version 2.0 (the # "License"); you may not use this file except in compliance # with the License. You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, # software distributed under the License is distributed on an # "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY # KIND, either express or implied. See the License for the # specific language governing permissions and limitations # under the License. # # # clientPort=2181 initLimit=10 autopurge.purgeInterval=24 syncLimit=5 tickTime=3000 dataDir=/hadoop/zookeeper autopurge.snapRetainCount=30 server.1=node1:2888:3888 server.2=node2:2888:3888 server.3=node3:2888:3888 [root@node3 ~]# cat /hadoop/zookeeper/myid 3
Hope this helps!
Created 08-02-2018 01:05 PM
it seems they are corelated as 1, 2 and 3:
server.1=mpp1.xxxxxx:2888:3888
server.2=mpp2.xxxxxx:2888:3888
server.3=mpp3.xxxxxxx:2888:3888
Created 08-03-2018 02:54 PM
the ids (myid) are 1, 2, and 3 respectively.
Created 08-02-2018 05:39 PM
@Alex Goron
What do you see on the /hadoop/zookeeper/myid for each zookeeper node?
Created 08-03-2018 08:10 PM
Hey @Alex Goron!
Hm, okay so.. let's check the last ambari-agent execution for zookeeper, what do you think? It may give us a clue about what's happening here.
find /var/lib/ambari-agent/data/ -type f -name "output-*.txt" -exec ls -ltrah {} \; | awk '{print $9}' |tail -1 | xargs cat
And then it should show up something like this:
2018-08-03 19:52:26,617 - Stack Feature Version Info: Cluster Stack=2.6, Command Stack=None, Command Version=2.6.5.0-292 -> 2.6.5.0-292 2018-08-03 19:52:26,649 - Using hadoop conf dir: /usr/hdp/2.6.5.0-292/hadoop/conf 2018-08-03 19:52:26,907 - Stack Feature Version Info: Cluster Stack=2.6, Command Stack=None, Command Version=2.6.5.0-292 -> 2.6.5.0-292 2018-08-03 19:52:26,916 - Using hadoop conf dir: /usr/hdp/2.6.5.0-292/hadoop/conf 2018-08-03 19:52:26,918 - Group['livy'] {} 2018-08-03 19:52:26,920 - Group['spark'] {} 2018-08-03 19:52:26,920 - Group['hdfs'] {} 2018-08-03 19:52:26,921 - Group['zeppelin'] {} 2018-08-03 19:52:26,921 - Group['hadoop'] {} 2018-08-03 19:52:26,921 - Group['users'] {} 2018-08-03 19:52:26,922 - User['hive'] {'gid': 'hadoop', 'fetch_nonlocal_groups': True, 'groups': [u'hadoop'], 'uid': None} 2018-08-03 19:52:26,923 - User['zookeeper'] {'gid': 'hadoop', 'fetch_nonlocal_groups': True, 'groups': [u'hadoop'], 'uid': None} 2018-08-03 19:52:26,925 - User['atlas'] {'gid': 'hadoop', 'fetch_nonlocal_groups': True, 'groups': [u'hadoop'], 'uid': None} 2018-08-03 19:52:26,926 - User['ams'] {'gid': 'hadoop', 'fetch_nonlocal_groups': True, 'groups': [u'hadoop'], 'uid': None} 2018-08-03 19:52:26,927 - User['tez'] {'gid': 'hadoop', 'fetch_nonlocal_groups': True, 'groups': [u'users'], 'uid': None} 2018-08-03 19:52:26,928 - User['zeppelin'] {'gid': 'hadoop', 'fetch_nonlocal_groups': True, 'groups': [u'zeppelin', u'hadoop'], 'uid': None} 2018-08-03 19:52:26,930 - User['livy'] {'gid': 'hadoop', 'fetch_nonlocal_groups': True, 'groups': [u'hadoop'], 'uid': None} 2018-08-03 19:52:26,931 - User['spark'] {'gid': 'hadoop', 'fetch_nonlocal_groups': True, 'groups': [u'hadoop'], 'uid': None} 2018-08-03 19:52:26,932 - User['ambari-qa'] {'gid': 'hadoop', 'fetch_nonlocal_groups': True, 'groups': [u'users'], 'uid': None} 2018-08-03 19:52:26,933 - User['flume'] {'gid': 'hadoop', 'fetch_nonlocal_groups': True, 'groups': [u'hadoop'], 'uid': None} 2018-08-03 19:52:26,934 - User['kafka'] {'gid': 'hadoop', 'fetch_nonlocal_groups': True, 'groups': [u'hadoop'], 'uid': None} 2018-08-03 19:52:26,936 - User['hdfs'] {'gid': 'hadoop', 'fetch_nonlocal_groups': True, 'groups': ['hdfs'], 'uid': None} 2018-08-03 19:52:26,937 - User['sqoop'] {'gid': 'hadoop', 'fetch_nonlocal_groups': True, 'groups': [u'hadoop'], 'uid': None} 2018-08-03 19:52:26,938 - User['yarn'] {'gid': 'hadoop', 'fetch_nonlocal_groups': True, 'groups': [u'hadoop'], 'uid': None} 2018-08-03 19:52:26,939 - User['mapred'] {'gid': 'hadoop', 'fetch_nonlocal_groups': True, 'groups': [u'hadoop'], 'uid': None} 2018-08-03 19:52:26,940 - User['hbase'] {'gid': 'hadoop', 'fetch_nonlocal_groups': True, 'groups': [u'hadoop'], 'uid': None} 2018-08-03 19:52:26,942 - User['hcat'] {'gid': 'hadoop', 'fetch_nonlocal_groups': True, 'groups': [u'hadoop'], 'uid': None} 2018-08-03 19:52:26,942 - File['/var/lib/ambari-agent/tmp/changeUid.sh'] {'content': StaticFile('changeToSecureUid.sh'), 'mode': 0555} 2018-08-03 19:52:26,945 - Execute['/var/lib/ambari-agent/tmp/changeUid.sh ambari-qa /tmp/hadoop-ambari-qa,/tmp/hsperfdata_ambari-qa,/home/ambari-qa,/tmp/ambari-qa,/tmp/sqoop-ambari-qa 0'] {'not_if': '(test $(id -u ambari-qa) -gt 1000) || (false)'} 2018-08-03 19:52:26,954 - Skipping Execute['/var/lib/ambari-agent/tmp/changeUid.sh ambari-qa /tmp/hadoop-ambari-qa,/tmp/hsperfdata_ambari-qa,/home/ambari-qa,/tmp/ambari-qa,/tmp/sqoop-ambari-qa 0'] due to not_if 2018-08-03 19:52:26,954 - Directory['/tmp/hbase-hbase'] {'owner': 'hbase', 'create_parents': True, 'mode': 0775, 'cd_access': 'a'} 2018-08-03 19:52:26,955 - Creating directory Directory['/tmp/hbase-hbase'] since it doesn't exist. 2018-08-03 19:52:26,955 - Changing owner for /tmp/hbase-hbase from 0 to hbase 2018-08-03 19:52:26,955 - Changing permission for /tmp/hbase-hbase from 755 to 775 2018-08-03 19:52:26,956 - File['/var/lib/ambari-agent/tmp/changeUid.sh'] {'content': StaticFile('changeToSecureUid.sh'), 'mode': 0555} 2018-08-03 19:52:26,958 - File['/var/lib/ambari-agent/tmp/changeUid.sh'] {'content': StaticFile('changeToSecureUid.sh'), 'mode': 0555} 2018-08-03 19:52:26,959 - call['/var/lib/ambari-agent/tmp/changeUid.sh hbase'] {} 2018-08-03 19:52:26,972 - call returned (0, '1013') 2018-08-03 19:52:26,973 - Execute['/var/lib/ambari-agent/tmp/changeUid.sh hbase /home/hbase,/tmp/hbase,/usr/bin/hbase,/var/log/hbase,/tmp/hbase-hbase 1013'] {'not_if': '(test $(id -u hbase) -gt 1000) || (false)'} 2018-08-03 19:52:26,980 - Skipping Execute['/var/lib/ambari-agent/tmp/changeUid.sh hbase /home/hbase,/tmp/hbase,/usr/bin/hbase,/var/log/hbase,/tmp/hbase-hbase 1013'] due to not_if 2018-08-03 19:52:26,981 - Group['hdfs'] {} 2018-08-03 19:52:26,981 - User['hdfs'] {'fetch_nonlocal_groups': True, 'groups': ['hdfs', u'hdfs']} 2018-08-03 19:52:26,982 - FS Type: 2018-08-03 19:52:26,982 - Directory['/etc/hadoop'] {'mode': 0755} 2018-08-03 19:52:27,005 - File['/usr/hdp/2.6.5.0-292/hadoop/conf/hadoop-env.sh'] {'content': InlineTemplate(...), 'owner': 'hdfs', 'group': 'hadoop'} 2018-08-03 19:52:27,006 - Directory['/var/lib/ambari-agent/tmp/hadoop_java_io_tmpdir'] {'owner': 'hdfs', 'group': 'hadoop', 'mode': 01777} 2018-08-03 19:52:27,029 - Execute[('setenforce', '0')] {'not_if': '(! which getenforce ) || (which getenforce && getenforce | grep -q Disabled)', 'sudo': True, 'only_if': 'test -f /selinux/enforce'} 2018-08-03 19:52:27,035 - Skipping Execute[('setenforce', '0')] due to not_if 2018-08-03 19:52:27,036 - Directory['/var/log/hadoop'] {'owner': 'root', 'create_parents': True, 'group': 'hadoop', 'mode': 0775, 'cd_access': 'a'} 2018-08-03 19:52:27,039 - Directory['/var/run/hadoop'] {'owner': 'root', 'create_parents': True, 'group': 'root', 'cd_access': 'a'} 2018-08-03 19:52:27,040 - Directory['/tmp/hadoop-hdfs'] {'owner': 'hdfs', 'create_parents': True, 'cd_access': 'a'} 2018-08-03 19:52:27,041 - Creating directory Directory['/tmp/hadoop-hdfs'] since it doesn't exist. 2018-08-03 19:52:27,041 - Changing owner for /tmp/hadoop-hdfs from 0 to hdfs 2018-08-03 19:52:27,048 - File['/usr/hdp/2.6.5.0-292/hadoop/conf/commons-logging.properties'] {'content': Template('commons-logging.properties.j2'), 'owner': 'hdfs'} 2018-08-03 19:52:27,051 - File['/usr/hdp/2.6.5.0-292/hadoop/conf/health_check'] {'content': Template('health_check.j2'), 'owner': 'hdfs'} 2018-08-03 19:52:27,062 - File['/usr/hdp/2.6.5.0-292/hadoop/conf/log4j.properties'] {'content': InlineTemplate(...), 'owner': 'hdfs', 'group': 'hadoop', 'mode': 0644} 2018-08-03 19:52:27,077 - File['/usr/hdp/2.6.5.0-292/hadoop/conf/hadoop-metrics2.properties'] {'content': InlineTemplate(...), 'owner': 'hdfs', 'group': 'hadoop'} 2018-08-03 19:52:27,078 - File['/usr/hdp/2.6.5.0-292/hadoop/conf/task-log4j.properties'] {'content': StaticFile('task-log4j.properties'), 'mode': 0755} 2018-08-03 19:52:27,079 - File['/usr/hdp/2.6.5.0-292/hadoop/conf/configuration.xsl'] {'owner': 'hdfs', 'group': 'hadoop'} 2018-08-03 19:52:27,085 - File['/etc/hadoop/conf/topology_mappings.data'] {'owner': 'hdfs', 'content': Template('topology_mappings.data.j2'), 'only_if': 'test -d /etc/hadoop/conf', 'group': 'hadoop', 'mode': 0644} 2018-08-03 19:52:27,091 - File['/etc/hadoop/conf/topology_script.py'] {'content': StaticFile('topology_script.py'), 'only_if': 'test -d /etc/hadoop/conf', 'mode': 0755} 2018-08-03 19:52:27,620 - Execute['source /usr/hdp/current/zookeeper-server/conf/zookeeper-env.sh ; env ZOOCFGDIR=/usr/hdp/current/zookeeper-server/conf ZOOCFG=zoo.cfg /usr/hdp/current/zookeeper-server/bin/zkServer.sh stop'] {'user': 'zookeeper'} 2018-08-03 19:52:27,723 - File['/var/run/zookeeper/zookeeper_server.pid'] {'action': ['delete']} 2018-08-03 19:52:27,724 - Pid file /var/run/zookeeper/zookeeper_server.pid is empty or does not exist 2018-08-03 19:52:27,726 - Directory['/usr/hdp/current/zookeeper-server/conf'] {'owner': 'zookeeper', 'create_parents': True, 'group': 'hadoop'} 2018-08-03 19:52:27,735 - File['/usr/hdp/current/zookeeper-server/conf/zookeeper-env.sh'] {'owner': 'zookeeper', 'content': InlineTemplate(...), 'group': 'hadoop'} 2018-08-03 19:52:27,742 - File['/usr/hdp/current/zookeeper-server/conf/zoo.cfg'] {'owner': 'zookeeper', 'content': Template('zoo.cfg.j2'), 'group': 'hadoop', 'mode': None} 2018-08-03 19:52:27,744 - File['/usr/hdp/current/zookeeper-server/conf/configuration.xsl'] {'owner': 'zookeeper', 'content': Template('configuration.xsl.j2'), 'group': 'hadoop', 'mode': None} 2018-08-03 19:52:27,745 - Directory['/var/run/zookeeper'] {'owner': 'zookeeper', 'create_parents': True, 'group': 'hadoop', 'mode': 0755} 2018-08-03 19:52:27,746 - Directory['/var/log/zookeeper'] {'owner': 'zookeeper', 'create_parents': True, 'group': 'hadoop', 'mode': 0755} 2018-08-03 19:52:27,746 - Directory['/hadoop/zookeeper'] {'owner': 'zookeeper', 'create_parents': True, 'group': 'hadoop', 'mode': 0755, 'cd_access': 'a'} 2018-08-03 19:52:27,747 - File['/hadoop/zookeeper/myid'] {'content': '1', 'mode': 0644} 2018-08-03 19:52:27,750 - File['/usr/hdp/current/zookeeper-server/conf/log4j.properties'] {'content': InlineTemplate(...), 'owner': 'zookeeper', 'group': 'hadoop', 'mode': 0644} 2018-08-03 19:52:27,751 - File['/usr/hdp/current/zookeeper-server/conf/zoo_sample.cfg'] {'owner': 'zookeeper', 'group': 'hadoop'} 2018-08-03 19:52:27,753 - Execute['source /usr/hdp/current/zookeeper-server/conf/zookeeper-env.sh ; env ZOOCFGDIR=/usr/hdp/current/zookeeper-server/conf ZOOCFG=zoo.cfg /usr/hdp/current/zookeeper-server/bin/zkServer.sh start'] {'not_if': 'ls /var/run/zookeeper/zookeeper_server.pid >/dev/null 2>&1 && ps -p `cat /var/run/zookeeper/zookeeper_server.pid` >/dev/null 2>&1', 'user': 'zookeeper'} Command completed successfully!
What do you have for the last line (Execute line)?
Hope this helps!
Created 08-07-2018 03:51 PM
@Vinicius Higa Murakami , no mention of any zookeeper execution on any node (even on the successful one).
Successful Node:
[root@mpp2 ~]# find /var/lib/ambari-agent/data/ -type f -name "output-*.txt" -exec ls -lrtah {} \; | awk '{print $9}' | tail -1 |xargs cat
2018-07-19 14:21:25,500 - Using hadoop conf dir: /usr/hdp/current/hadoop-client/conf
2018-07-19 14:21:25,503 - checked_call['hostid'] {}
2018-07-19 14:21:25,511 - checked_call returned (0, '7d0aa000')
2018-07-19 14:21:25,514 - Execute['/usr/sbin/ambari-metrics-monitor --config /etc/ambari-metrics-monitor/conf stop'] {'user': 'ams'}
2018-07-19 14:21:30,612 - Waiting for actual component stop
2018-07-19 14:21:30,719 - Process with pid 13997 is not running. Stale pid file at /var/run/ambari-metrics-monitor/ambari-metrics-monitor.pid
Failing Nodes:
[root@mpp1 ~]# find /var/lib/ambari-agent/data/ -type f -name "output-*.txt" -exec ls -lrtah {} \; | awk '{print $9}' | tail -1 |xargs cat | less | grep -i execute
2018-07-19 16:27:58,587 - Execute['/var/lib/ambari-agent/tmp/changeUid.sh ambari-qa /tmp/hadoop-ambari-qa,/tmp/hsperfdata_ambari-qa,/home/ambari-qa,/tmp/ambari-qa,/tmp/sqoop-ambari-qa 0'] {'not_if': '(test $(id -u ambari-qa) -gt 1000) || (false)'} 2018-07-19 16:27:58,593 - Skipping Execute['/var/lib/ambari-agent/tmp/changeUid.sh ambari-qa /tmp/hadoop-ambari-qa,/tmp/hsperfdata_ambari-qa,/home/ambari-qa,/tmp/ambari-qa,/tmp/sqoop-ambari-qa 0'] due to not_if 2018-07-19 16:27:58,607 - Execute['/var/lib/ambari-agent/tmp/changeUid.sh hbase /home/hbase,/tmp/hbase,/usr/bin/hbase,/var/log/hbase,/tmp/hbase-hbase 1014'] {'not_if': '(test $(id -u hbase) -gt 1000) || (false)'} 2018-07-19 16:27:58,612 - Skipping Execute['/var/lib/ambari-agent/tmp/changeUid.sh hbase /home/hbase,/tmp/hbase,/usr/bin/hbase,/var/log/hbase,/tmp/hbase-hbase 1014'] due to not_if 2018-07-19 16:27:58,652 - Execute[('setenforce', '0')] {'not_if': '(! which getenforce ) || (which getenforce && getenforce | grep -q Disabled)', 'sudo': True, 'only_if': 'test -f /selinux/enforce'} 2018-07-19 16:27:58,661 - Skipping Execute[('setenforce', '0')] due to not_if 2018-07-19 16:27:59,207 - Execute['ambari-sudo.sh su hdfs -l -s /bin/bash -c 'ulimit -c unlimited ; /usr/hdp/current/hadoop-client/sbin/hadoop-daemon.sh --config /usr/hdp/current/hadoop-client/conf start journalnode''] {'environment': {'HADOOP_LIBEXEC_DIR': '/usr/hdp/current/hadoop-client/libexec'}, 'not_if': 'ambari-sudo.sh -H -E test -f /var/run/hadoop/hdfs/hadoop-hdfs-journalnode.pid && ambari-sudo.sh -H -E pgrep -F /var/run/hadoop/hdfs/hadoop-hdfs-journalnode.pid'}
[root@mpp3 ~]# find /var/lib/ambari-agent/data/ -type f -name "output-*.txt" -exec ls -lrtah {} \; | awk '{print $9}' | tail -1 |xargs cat | less | grep -i execute
2018-07-19 16:31:34,081 - Execute['/var/lib/ambari-agent/tmp/changeUid.sh ambari-qa /tmp/hadoop-ambari-qa,/tmp/hsperfdata_ambari-qa,/home/ambari-qa,/tmp/ambari-qa,/tmp/sqoop-ambari-qa 0'] {'not_if': '(test $(id -u ambari-qa) -gt 1000) || (false)'} 2018-07-19 16:31:34,088 - Skipping Execute['/var/lib/ambari-agent/tmp/changeUid.sh ambari-qa /tmp/hadoop-ambari-qa,/tmp/hsperfdata_ambari-qa,/home/ambari-qa,/tmp/ambari-qa,/tmp/sqoop-ambari-qa 0'] due to not_if 2018-07-19 16:31:34,102 - Execute['/var/lib/ambari-agent/tmp/changeUid.sh hbase /home/hbase,/tmp/hbase,/usr/bin/hbase,/var/log/hbase,/tmp/hbase-hbase 1016'] {'not_if': '(test $(id -u hbase) -gt 1000) || (false)'} 2018-07-19 16:31:34,108 - Skipping Execute['/var/lib/ambari-agent/tmp/changeUid.sh hbase /home/hbase,/tmp/hbase,/usr/bin/hbase,/var/log/hbase,/tmp/hbase-hbase 1016'] due to not_if 2018-07-19 16:31:34,142 - Execute[('setenforce', '0')] {'not_if': '(! which getenforce ) || (which getenforce && getenforce | grep -q Disabled)', 'sudo': True, 'only_if': 'test -f /selinux/enforce'} 2018-07-19 16:31:34,149 - Skipping Execute[('setenforce', '0')] due to not_if 2018-07-19 16:31:34,772 - HdfsResource['/ats/done'] {'security_enabled': False, 'hadoop_bin_dir': '/usr/hdp/current/hadoop-client/bin', 'keytab': [EMPTY], 'dfs_type': '', 'default_fs': 'hdfs://Microscopic', 'hdfs_resource_ignore_file': '/var/lib/ambari-agent/data/.hdfs_resource_ignore', 'hdfs_site': ..., 'kinit_path_local': 'kinit', 'principal_name': [EMPTY], 'user': 'hdfs', 'change_permissions_for_parents': True, 'owner': 'yarn', 'group': 'hadoop', 'hadoop_conf_dir': '/usr/hdp/current/hadoop-client/conf', 'type': 'directory', 'action': ['create_on_execute'], 'immutable_paths': [u'/hdfs/nm/app-logs', u'/apps/hive/warehouse', u'/mr-history/done', u'/tmp'], 'mode': 0755}
Created 08-07-2018 08:28 PM
@Alex Goron, sorry man, my fault.
I forgot to mention, that you'll need to restart the zookeeper to see these execution commands. Cause, every time you restart a process Ambari-agent will generate an output file, so in your comment above, you were getting the output related to other components.
Then, after you made the restart, try to grab the command with that pipeline containing find /var/lib/ambari-agent and so on.
Hope this helps!
Created 08-16-2018 09:50 PM
@Vinicius Higa Murakami , I was trying to trace the logs the while starting zookeeper at the same time, however I see exactly the same old output from Jul-19. It did not write anything new even during this startup.
Created 08-17-2018 03:46 AM
Hi @Alex Goron!
Hmm, did you try to run the following?
Ambari > Zookeeper > Service Actions (grey button on the right corner) > Run Service Check
Ambari > Zookeeper > Service Actions (grey button on the right corner) > Download Client Configs
And also, try to restart the zookeeper and go to the host and check the output/error files
ls -ltrah /var/lib/ambari-agent/data/
Hope this helps!
Created 09-04-2018 02:52 PM
@Vinicius Higa Murakami, thanks and sorry for delayed response. "Run Service Check" appears to be grayed out for Zookeeper, so was unable to run it. Client configs seemed fine and has all the changed I made in Ambari.
/var/lib/ambari-agent/data/ shows error files but they are all of 0 sizes and empty on all the nodes.
Thanks,
Alex
Created 09-04-2018 04:08 PM
Hi @Alex Goron!
Hm, what about the last output file?
For example:
cat /var/lib/ambari-agent/data/output-1270.txt
It should at least show what's happening with your zookeeper log4j.
Hope this helps
Created 09-04-2018 06:39 PM
Thanks @Vinicius Higa Murakami, just checked and on all z nodes it showes the only mention of zookeeper log4j:
File['/usr/hdp/current/zookeeper-server/conf/log4j.properties'] {'content': InlineTemplate(...), 'owner': 'zookeeper', 'group': 'hadoop', 'mode': 0644}
Thanks,
Alex