Support Questions

Find answers, ask questions, and share your expertise

Changes made to zookeper-log4j through Ambari are not getting propagated

Contributor

I'm heaving issues with zookeeper being stopped a few seconds after startup. My problem though is that it seems the changes made to log4j properties through Ambari are not getting propagated. I'm trying to change rootLogger to DEBUG and change the log file name from default to zookeeper.log Here is the "Advanced zookeeper log4j":

# DEFAULT: console appender only

#log4j.rootLogger=INFO, CONSOLE, ROLLINGFILE

# Example with rolling log file

log4j.rootLogger=DEBUG, CONSOLE, ROLLINGFILE

# Example with rolling log file and tracing

#log4j.rootLogger=TRACE, CONSOLE, ROLLINGFILE, TRACEFILE

# # Log INFO level and above messages to the console

# log4j.appender.CONSOLE=org.apache.log4j.ConsoleAppender log4j.appender.CONSOLE.Threshold=INFO log4j.appender.CONSOLE.layout=org.apache.log4j.PatternLayout log4j.appender.CONSOLE.layout.ConversionPattern=%d{ISO8601} - %-5p [%t:%C{1}@%L] - %m%n # # Add ROLLINGFILE to rootLogger to get log file output # Log DEBUG level and above messages to a log file log4j.appender.ROLLINGFILE=org.apache.log4j.RollingFileAppender log4j.appender.ROLLINGFILE.Threshold=DEBUG log4j.appender.ROLLINGFILE.File={{zk_log_dir}}/zookeeper.log

# Max log file size of 10MB log4j.appender.ROLLINGFILE.MaxFileSize={{zookeeper_log_max_backup_size}}MB

# uncomment the next line to limit number of backup files

#log4j.appender.ROLLINGFILE.MaxBackupIndex={{zookeeper_log_number_of_backup_files}} log4j.appender.ROLLINGFILE.layout=org.apache.log4j.PatternLayout log4j.appender.ROLLINGFILE.layout.ConversionPattern=%d{ISO8601} - %-5p [%t:%C{1}@%L] - %m%n

# # Add TRACEFILE to rootLogger to get log file output # Log DEBUG level and above messages to a log file log4j.appender.TRACEFILE=org.apache.log4j.FileAppender log4j.appender.TRACEFILE.Threshold=TRACE log4j.appender.TRACEFILE.File=zookeeper_trace.log log4j.appender.TRACEFILE.layout=org.apache.log4j.PatternLayout

### Notice we are including log4j's NDC here (%x) log4j.appender.TRACEFILE.layout.ConversionPattern=%d{ISO8601} - %-5p [%t:%C{1}@%L][%x] - %m%n

But yet after restarting ps -ef | grpe zookeper shows the following:

zookeep+ 18598 1 4 23:49 ? 00:00:00 /usr/jdk64/jdk1.8.0_112/bin/java -Dzookeeper.log.dir=/xxx/zk/logs -Dzookeeper.log.file=zookeeper-zookeeper-server-xxxxxxxxxxx.log -Dzookeeper.root.logger=INFO,ROLLINGFILE .....

Note that neither DEBUG, nor new log file name (zookeeper.log) have not been propagated by ambari. Is there any property missing?

Thanks, Alex

20 REPLIES 20

Hello Alex Goron

Hm indeed, that's quite strange.
Is there any "Custom zookeeper-log4j" set? Or any config groups different than default used by zk?
Ps: I'm attaching my zookeer-log4j.txt so you can compare with yours, or even replace it 🙂
Hope this helps!

Contributor
@Vinicius Higa MurakamiThanks for helping me with that. Your file looks absolutely the same as mine - line to line. And no other config groups configured. I was hoping to get into debug mode to see why zookeeper is failing after startup on all but just one node. But not sure at this point if there is any other way to look at the logs and find the reason.

Thanks,

Alex

Hi there @Alex Goron!

Hm, did you look at the zookeeper-env.sh? Maybe there's something extra set.
Btw, inside the zookeeper log4f file on the ZK hosts can you see the changes applied for DEBUG?

Contributor

Hi @Vinicius Higa Murakami , I found log4j in 2 places. Since I'm relatively new to HDP can you confirm which one is is being read during the startup? The log4j.properties which is under /etc/zookeeper/2.6.2.0-205/0/ did reflect the changes. The one which is under /usr/hdp/2.6.2.0-205/etc/zookeeper/conf.dist/log4j.properties did not and the properties names look a bit different (with '.' in the names)

As for zookeeper-env.sh (the one that is under /etc/zookeeper/2.6.2.0-205/0/) has my current directory for the log listed which is also in line with Ambari "Advanced zookeeper-env" field .

So assuming the config directory is /etc/zookeeper/2.6.2.0-205/0/, the good news is that it picks up changes from Ambari, however it does not look like they are being read.

Also I noticed that it adds new log entries only to one host (the one that does not fail after startup), the other 2 hosts that stays after startup for 30 secs or so and then fail do not have any new log entries even if I start them separately.

Thanks,

Alex

Hello @Alex Goron!
Wow, that's weird 😞
Okay, so answering your question it should be under the /etc/zookeeper. What do you have on /usr/hdp/<VERSION>/zookeeper/ should point (symbolic link) to the /etc/zookeeper. Just like this:

[root@node3 kafka]# ls -ltrah /usr/hdf/current/zookeeper-server/
total 788K
-rw-r--r-- 1 root root 782K May 23 15:45 zookeeper-3.4.6.3.1.2.0-7.jar
drwxr-xr-x 2 root root  222 Jul 21 06:35 bin
drwxr-xr-x 5 root root   42 Jul 21 06:35 doc
drwxr-xr-x 3 root root   18 Jul 21 06:35 etc
drwxr-xr-x 3 root root   18 Jul 21 06:35 man
drwxr-xr-x 2 root root 4.0K Jul 21 06:35 lib
lrwxrwxrwx 1 root root   29 Jul 21 06:35 zookeeper.jar -> zookeeper-3.4.6.3.1.2.0-7.jar
lrwxrwxrwx 1 root root   26 Jul 21 06:35 conf -> /etc/zookeeper/3.1.2.0-7/0
drwxr-xr-x 7 root root  131 Jul 21 06:35 .
drwxr-xr-x 8 root root  107 Jul 21 06:45 ..

Just asking but.. do you think it's possible to have any other services that might be installed on those 2 hosts (failing) that relies on ZK and it's apart from Ambari? E.g. Kafka, Atlas, HA of other components.. and so on. If so, usually this services comes with ZK embedded to launch the standalone/quickstart. Otherwise, we can check the logs from ambari-agent to see if shows up something.
Ps: one last thing, could you check if you have any env variable set on those 2 hosts that starts with ZOO? I'm also putting my zookeeper-env.sh below, to be compared with yours 🙂


export JAVA_HOME={{java64_home}}
export ZOOKEEPER_HOME={{zk_home}}
export ZOO_LOG_DIR={{zk_log_dir}}
export ZOOPIDFILE={{zk_pid_file}}
export SERVER_JVMFLAGS={{zk_server_heapsize}}
export JAVA=$JAVA_HOME/bin/java
export CLASSPATH=$CLASSPATH:/usr/share/zookeeper/*


{% if security_enabled %}
export SERVER_JVMFLAGS="$SERVER_JVMFLAGS -Djava.security.auth.login.config={{zk_server_jaas_file}}"
export CLIENT_JVMFLAGS="$CLIENT_JVMFLAGS -Djava.security.auth.login.config={{zk_client_jaas_file}}"
{% endif %}
Hope this helps!

Contributor

Thanks @Vinicius Higa Murakami, the ZOO values are set in zookeeper-env.sh the same way as yours just with real paths and looks the same under the ambari config properties.

On one of the ZK failing node I have: Knox, Hmaster, Journal Node, Data Node, Node Manager, HDFS client and the stuff that I was forced to install by Ambari (Activity explorer, HST Server, Metrics Collector, Grafana, HST agent, Metrics Monitor)

On another ZK failing node: Journal Node, Zookeeper Failover Controller, Standby Name Node, Standby Resource Manager, Job History Server, Hive Server2, Hive Metastore, Data Node, Node Manager, Ambari Server, PostgreSQL, Spark/Spark2, Spark/Spark2 History server, Phoenix Query Server, dependencies (Metrics Monitor, HST Agent, App Timeline Server); clients: Hbase, HDFS, Hive, MR2, Pig, Slider, Spark, Tez, yarn, zookeeper

All the services are currently brought down for troubleshooting purposes. Though zookerper should startup clean first.

Thanks,

Alex

Got it, and what about the ambari-agent log on those hosts? Is it showing something? Like Errors or Warnings?
And could you check the first lines of the zookeeper out and log?

head -1000 /var/log/zookeeper/zookeeper.log
head -1000 /var/log/zookeeper/zookeeper-zookeeper-server-<ZK-HOST>.out

Contributor

zk-first-last-1000-log-out.tar.gz@Vinicius Higa Murakami

ambari-agent log mostly was complaining on node managers which is legit as they are down. I put all the services into a maintenance mode in hopes to suppress that 'legit' noise. I did not see anything suspicious yet and specifically for zookeeper either.

As for zookeeper.log and zookeeper.out in the failing nodes like I said earlier it has stopped writing new entries during zookeeper startup. The last entries observed in those files were of July 19th. Just in case I'm attaching first and last 1000 lines for log and out files from one of the failing nodes. But all in all not sure if that information may be relevant for now.

Thanks,

Alex

Hi @Alex Goron!
Could check if your zk nodes have the right correlated ID?
Here's mine e.g.

[root@node3 ~]# cat /usr/hdf/current/zookeeper-server/conf/zoo.cfg 


#
#
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements.  See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership.  The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License.  You may obtain a copy of the License at
#
#   http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied.  See the License for the
# specific language governing permissions and limitations
# under the License.
#
#
#


clientPort=2181
initLimit=10
autopurge.purgeInterval=24
syncLimit=5
tickTime=3000
dataDir=/hadoop/zookeeper
autopurge.snapRetainCount=30
server.1=node1:2888:3888
server.2=node2:2888:3888
server.3=node3:2888:3888


[root@node3 ~]# cat /hadoop/zookeeper/myid 
3

Hope this helps!

Contributor

@Vinicius Higa Murakami ,

it seems they are corelated as 1, 2 and 3:

server.1=mpp1.xxxxxx:2888:3888

server.2=mpp2.xxxxxx:2888:3888

server.3=mpp3.xxxxxxx:2888:3888

Contributor

@Vinicius Higa Murakami

the ids (myid) are 1, 2, and 3 respectively.

@Alex Goron
What do you see on the /hadoop/zookeeper/myid for each zookeeper node?

Hey @Alex Goron!
Hm, okay so.. let's check the last ambari-agent execution for zookeeper, what do you think? It may give us a clue about what's happening here.

find /var/lib/ambari-agent/data/ -type f -name "output-*.txt" -exec ls -ltrah {} \; | awk '{print $9}' |tail -1 | xargs cat

And then it should show up something like this:

2018-08-03 19:52:26,617 - Stack Feature Version Info: Cluster Stack=2.6, Command Stack=None, Command Version=2.6.5.0-292 -> 2.6.5.0-292
2018-08-03 19:52:26,649 - Using hadoop conf dir: /usr/hdp/2.6.5.0-292/hadoop/conf
2018-08-03 19:52:26,907 - Stack Feature Version Info: Cluster Stack=2.6, Command Stack=None, Command Version=2.6.5.0-292 -> 2.6.5.0-292
2018-08-03 19:52:26,916 - Using hadoop conf dir: /usr/hdp/2.6.5.0-292/hadoop/conf
2018-08-03 19:52:26,918 - Group['livy'] {}
2018-08-03 19:52:26,920 - Group['spark'] {}
2018-08-03 19:52:26,920 - Group['hdfs'] {}
2018-08-03 19:52:26,921 - Group['zeppelin'] {}
2018-08-03 19:52:26,921 - Group['hadoop'] {}
2018-08-03 19:52:26,921 - Group['users'] {}
2018-08-03 19:52:26,922 - User['hive'] {'gid': 'hadoop', 'fetch_nonlocal_groups': True, 'groups': [u'hadoop'], 'uid': None}
2018-08-03 19:52:26,923 - User['zookeeper'] {'gid': 'hadoop', 'fetch_nonlocal_groups': True, 'groups': [u'hadoop'], 'uid': None}
2018-08-03 19:52:26,925 - User['atlas'] {'gid': 'hadoop', 'fetch_nonlocal_groups': True, 'groups': [u'hadoop'], 'uid': None}
2018-08-03 19:52:26,926 - User['ams'] {'gid': 'hadoop', 'fetch_nonlocal_groups': True, 'groups': [u'hadoop'], 'uid': None}
2018-08-03 19:52:26,927 - User['tez'] {'gid': 'hadoop', 'fetch_nonlocal_groups': True, 'groups': [u'users'], 'uid': None}
2018-08-03 19:52:26,928 - User['zeppelin'] {'gid': 'hadoop', 'fetch_nonlocal_groups': True, 'groups': [u'zeppelin', u'hadoop'], 'uid': None}
2018-08-03 19:52:26,930 - User['livy'] {'gid': 'hadoop', 'fetch_nonlocal_groups': True, 'groups': [u'hadoop'], 'uid': None}
2018-08-03 19:52:26,931 - User['spark'] {'gid': 'hadoop', 'fetch_nonlocal_groups': True, 'groups': [u'hadoop'], 'uid': None}
2018-08-03 19:52:26,932 - User['ambari-qa'] {'gid': 'hadoop', 'fetch_nonlocal_groups': True, 'groups': [u'users'], 'uid': None}
2018-08-03 19:52:26,933 - User['flume'] {'gid': 'hadoop', 'fetch_nonlocal_groups': True, 'groups': [u'hadoop'], 'uid': None}
2018-08-03 19:52:26,934 - User['kafka'] {'gid': 'hadoop', 'fetch_nonlocal_groups': True, 'groups': [u'hadoop'], 'uid': None}
2018-08-03 19:52:26,936 - User['hdfs'] {'gid': 'hadoop', 'fetch_nonlocal_groups': True, 'groups': ['hdfs'], 'uid': None}
2018-08-03 19:52:26,937 - User['sqoop'] {'gid': 'hadoop', 'fetch_nonlocal_groups': True, 'groups': [u'hadoop'], 'uid': None}
2018-08-03 19:52:26,938 - User['yarn'] {'gid': 'hadoop', 'fetch_nonlocal_groups': True, 'groups': [u'hadoop'], 'uid': None}
2018-08-03 19:52:26,939 - User['mapred'] {'gid': 'hadoop', 'fetch_nonlocal_groups': True, 'groups': [u'hadoop'], 'uid': None}
2018-08-03 19:52:26,940 - User['hbase'] {'gid': 'hadoop', 'fetch_nonlocal_groups': True, 'groups': [u'hadoop'], 'uid': None}
2018-08-03 19:52:26,942 - User['hcat'] {'gid': 'hadoop', 'fetch_nonlocal_groups': True, 'groups': [u'hadoop'], 'uid': None}
2018-08-03 19:52:26,942 - File['/var/lib/ambari-agent/tmp/changeUid.sh'] {'content': StaticFile('changeToSecureUid.sh'), 'mode': 0555}
2018-08-03 19:52:26,945 - Execute['/var/lib/ambari-agent/tmp/changeUid.sh ambari-qa /tmp/hadoop-ambari-qa,/tmp/hsperfdata_ambari-qa,/home/ambari-qa,/tmp/ambari-qa,/tmp/sqoop-ambari-qa 0'] {'not_if': '(test $(id -u ambari-qa) -gt 1000) || (false)'}
2018-08-03 19:52:26,954 - Skipping Execute['/var/lib/ambari-agent/tmp/changeUid.sh ambari-qa /tmp/hadoop-ambari-qa,/tmp/hsperfdata_ambari-qa,/home/ambari-qa,/tmp/ambari-qa,/tmp/sqoop-ambari-qa 0'] due to not_if
2018-08-03 19:52:26,954 - Directory['/tmp/hbase-hbase'] {'owner': 'hbase', 'create_parents': True, 'mode': 0775, 'cd_access': 'a'}
2018-08-03 19:52:26,955 - Creating directory Directory['/tmp/hbase-hbase'] since it doesn't exist.
2018-08-03 19:52:26,955 - Changing owner for /tmp/hbase-hbase from 0 to hbase
2018-08-03 19:52:26,955 - Changing permission for /tmp/hbase-hbase from 755 to 775
2018-08-03 19:52:26,956 - File['/var/lib/ambari-agent/tmp/changeUid.sh'] {'content': StaticFile('changeToSecureUid.sh'), 'mode': 0555}
2018-08-03 19:52:26,958 - File['/var/lib/ambari-agent/tmp/changeUid.sh'] {'content': StaticFile('changeToSecureUid.sh'), 'mode': 0555}
2018-08-03 19:52:26,959 - call['/var/lib/ambari-agent/tmp/changeUid.sh hbase'] {}
2018-08-03 19:52:26,972 - call returned (0, '1013')
2018-08-03 19:52:26,973 - Execute['/var/lib/ambari-agent/tmp/changeUid.sh hbase /home/hbase,/tmp/hbase,/usr/bin/hbase,/var/log/hbase,/tmp/hbase-hbase 1013'] {'not_if': '(test $(id -u hbase) -gt 1000) || (false)'}
2018-08-03 19:52:26,980 - Skipping Execute['/var/lib/ambari-agent/tmp/changeUid.sh hbase /home/hbase,/tmp/hbase,/usr/bin/hbase,/var/log/hbase,/tmp/hbase-hbase 1013'] due to not_if
2018-08-03 19:52:26,981 - Group['hdfs'] {}
2018-08-03 19:52:26,981 - User['hdfs'] {'fetch_nonlocal_groups': True, 'groups': ['hdfs', u'hdfs']}
2018-08-03 19:52:26,982 - FS Type: 
2018-08-03 19:52:26,982 - Directory['/etc/hadoop'] {'mode': 0755}
2018-08-03 19:52:27,005 - File['/usr/hdp/2.6.5.0-292/hadoop/conf/hadoop-env.sh'] {'content': InlineTemplate(...), 'owner': 'hdfs', 'group': 'hadoop'}
2018-08-03 19:52:27,006 - Directory['/var/lib/ambari-agent/tmp/hadoop_java_io_tmpdir'] {'owner': 'hdfs', 'group': 'hadoop', 'mode': 01777}
2018-08-03 19:52:27,029 - Execute[('setenforce', '0')] {'not_if': '(! which getenforce ) || (which getenforce && getenforce | grep -q Disabled)', 'sudo': True, 'only_if': 'test -f /selinux/enforce'}
2018-08-03 19:52:27,035 - Skipping Execute[('setenforce', '0')] due to not_if
2018-08-03 19:52:27,036 - Directory['/var/log/hadoop'] {'owner': 'root', 'create_parents': True, 'group': 'hadoop', 'mode': 0775, 'cd_access': 'a'}
2018-08-03 19:52:27,039 - Directory['/var/run/hadoop'] {'owner': 'root', 'create_parents': True, 'group': 'root', 'cd_access': 'a'}
2018-08-03 19:52:27,040 - Directory['/tmp/hadoop-hdfs'] {'owner': 'hdfs', 'create_parents': True, 'cd_access': 'a'}
2018-08-03 19:52:27,041 - Creating directory Directory['/tmp/hadoop-hdfs'] since it doesn't exist.
2018-08-03 19:52:27,041 - Changing owner for /tmp/hadoop-hdfs from 0 to hdfs
2018-08-03 19:52:27,048 - File['/usr/hdp/2.6.5.0-292/hadoop/conf/commons-logging.properties'] {'content': Template('commons-logging.properties.j2'), 'owner': 'hdfs'}
2018-08-03 19:52:27,051 - File['/usr/hdp/2.6.5.0-292/hadoop/conf/health_check'] {'content': Template('health_check.j2'), 'owner': 'hdfs'}
2018-08-03 19:52:27,062 - File['/usr/hdp/2.6.5.0-292/hadoop/conf/log4j.properties'] {'content': InlineTemplate(...), 'owner': 'hdfs', 'group': 'hadoop', 'mode': 0644}
2018-08-03 19:52:27,077 - File['/usr/hdp/2.6.5.0-292/hadoop/conf/hadoop-metrics2.properties'] {'content': InlineTemplate(...), 'owner': 'hdfs', 'group': 'hadoop'}
2018-08-03 19:52:27,078 - File['/usr/hdp/2.6.5.0-292/hadoop/conf/task-log4j.properties'] {'content': StaticFile('task-log4j.properties'), 'mode': 0755}
2018-08-03 19:52:27,079 - File['/usr/hdp/2.6.5.0-292/hadoop/conf/configuration.xsl'] {'owner': 'hdfs', 'group': 'hadoop'}
2018-08-03 19:52:27,085 - File['/etc/hadoop/conf/topology_mappings.data'] {'owner': 'hdfs', 'content': Template('topology_mappings.data.j2'), 'only_if': 'test -d /etc/hadoop/conf', 'group': 'hadoop', 'mode': 0644}
2018-08-03 19:52:27,091 - File['/etc/hadoop/conf/topology_script.py'] {'content': StaticFile('topology_script.py'), 'only_if': 'test -d /etc/hadoop/conf', 'mode': 0755}
2018-08-03 19:52:27,620 - Execute['source /usr/hdp/current/zookeeper-server/conf/zookeeper-env.sh ; env ZOOCFGDIR=/usr/hdp/current/zookeeper-server/conf ZOOCFG=zoo.cfg /usr/hdp/current/zookeeper-server/bin/zkServer.sh stop'] {'user': 'zookeeper'}
2018-08-03 19:52:27,723 - File['/var/run/zookeeper/zookeeper_server.pid'] {'action': ['delete']}
2018-08-03 19:52:27,724 - Pid file /var/run/zookeeper/zookeeper_server.pid is empty or does not exist
2018-08-03 19:52:27,726 - Directory['/usr/hdp/current/zookeeper-server/conf'] {'owner': 'zookeeper', 'create_parents': True, 'group': 'hadoop'}
2018-08-03 19:52:27,735 - File['/usr/hdp/current/zookeeper-server/conf/zookeeper-env.sh'] {'owner': 'zookeeper', 'content': InlineTemplate(...), 'group': 'hadoop'}
2018-08-03 19:52:27,742 - File['/usr/hdp/current/zookeeper-server/conf/zoo.cfg'] {'owner': 'zookeeper', 'content': Template('zoo.cfg.j2'), 'group': 'hadoop', 'mode': None}
2018-08-03 19:52:27,744 - File['/usr/hdp/current/zookeeper-server/conf/configuration.xsl'] {'owner': 'zookeeper', 'content': Template('configuration.xsl.j2'), 'group': 'hadoop', 'mode': None}
2018-08-03 19:52:27,745 - Directory['/var/run/zookeeper'] {'owner': 'zookeeper', 'create_parents': True, 'group': 'hadoop', 'mode': 0755}
2018-08-03 19:52:27,746 - Directory['/var/log/zookeeper'] {'owner': 'zookeeper', 'create_parents': True, 'group': 'hadoop', 'mode': 0755}
2018-08-03 19:52:27,746 - Directory['/hadoop/zookeeper'] {'owner': 'zookeeper', 'create_parents': True, 'group': 'hadoop', 'mode': 0755, 'cd_access': 'a'}
2018-08-03 19:52:27,747 - File['/hadoop/zookeeper/myid'] {'content': '1', 'mode': 0644}
2018-08-03 19:52:27,750 - File['/usr/hdp/current/zookeeper-server/conf/log4j.properties'] {'content': InlineTemplate(...), 'owner': 'zookeeper', 'group': 'hadoop', 'mode': 0644}
2018-08-03 19:52:27,751 - File['/usr/hdp/current/zookeeper-server/conf/zoo_sample.cfg'] {'owner': 'zookeeper', 'group': 'hadoop'}
2018-08-03 19:52:27,753 - Execute['source /usr/hdp/current/zookeeper-server/conf/zookeeper-env.sh ; env ZOOCFGDIR=/usr/hdp/current/zookeeper-server/conf ZOOCFG=zoo.cfg /usr/hdp/current/zookeeper-server/bin/zkServer.sh start'] {'not_if': 'ls /var/run/zookeeper/zookeeper_server.pid >/dev/null 2>&1 && ps -p `cat /var/run/zookeeper/zookeeper_server.pid` >/dev/null 2>&1', 'user': 'zookeeper'}


Command completed successfully!

What do you have for the last line (Execute line)?

Hope this helps!

Contributor

@Vinicius Higa Murakami , no mention of any zookeeper execution on any node (even on the successful one).

Successful Node:

[root@mpp2 ~]# find /var/lib/ambari-agent/data/ -type f -name "output-*.txt" -exec ls -lrtah {} \; | awk '{print $9}' | tail -1 |xargs cat

2018-07-19 14:21:25,500 - Using hadoop conf dir: /usr/hdp/current/hadoop-client/conf 2018-07-19 14:21:25,503 - checked_call['hostid'] {} 2018-07-19 14:21:25,511 - checked_call returned (0, '7d0aa000') 2018-07-19 14:21:25,514 - Execute['/usr/sbin/ambari-metrics-monitor --config /etc/ambari-metrics-monitor/conf stop'] {'user': 'ams'} 2018-07-19 14:21:30,612 - Waiting for actual component stop 2018-07-19 14:21:30,719 - Process with pid 13997 is not running. Stale pid file at /var/run/ambari-metrics-monitor/ambari-metrics-monitor.pid

Failing Nodes:

[root@mpp1 ~]# find /var/lib/ambari-agent/data/ -type f -name "output-*.txt" -exec ls -lrtah {} \; | awk '{print $9}' | tail -1 |xargs cat | less | grep -i execute

2018-07-19 16:27:58,587 - Execute['/var/lib/ambari-agent/tmp/changeUid.sh ambari-qa /tmp/hadoop-ambari-qa,/tmp/hsperfdata_ambari-qa,/home/ambari-qa,/tmp/ambari-qa,/tmp/sqoop-ambari-qa 0'] {'not_if': '(test $(id -u ambari-qa) -gt 1000) || (false)'} 2018-07-19 16:27:58,593 - Skipping Execute['/var/lib/ambari-agent/tmp/changeUid.sh ambari-qa /tmp/hadoop-ambari-qa,/tmp/hsperfdata_ambari-qa,/home/ambari-qa,/tmp/ambari-qa,/tmp/sqoop-ambari-qa 0'] due to not_if 2018-07-19 16:27:58,607 - Execute['/var/lib/ambari-agent/tmp/changeUid.sh hbase /home/hbase,/tmp/hbase,/usr/bin/hbase,/var/log/hbase,/tmp/hbase-hbase 1014'] {'not_if': '(test $(id -u hbase) -gt 1000) || (false)'} 2018-07-19 16:27:58,612 - Skipping Execute['/var/lib/ambari-agent/tmp/changeUid.sh hbase /home/hbase,/tmp/hbase,/usr/bin/hbase,/var/log/hbase,/tmp/hbase-hbase 1014'] due to not_if 2018-07-19 16:27:58,652 - Execute[('setenforce', '0')] {'not_if': '(! which getenforce ) || (which getenforce && getenforce | grep -q Disabled)', 'sudo': True, 'only_if': 'test -f /selinux/enforce'} 2018-07-19 16:27:58,661 - Skipping Execute[('setenforce', '0')] due to not_if 2018-07-19 16:27:59,207 - Execute['ambari-sudo.sh su hdfs -l -s /bin/bash -c 'ulimit -c unlimited ; /usr/hdp/current/hadoop-client/sbin/hadoop-daemon.sh --config /usr/hdp/current/hadoop-client/conf start journalnode''] {'environment': {'HADOOP_LIBEXEC_DIR': '/usr/hdp/current/hadoop-client/libexec'}, 'not_if': 'ambari-sudo.sh -H -E test -f /var/run/hadoop/hdfs/hadoop-hdfs-journalnode.pid && ambari-sudo.sh -H -E pgrep -F /var/run/hadoop/hdfs/hadoop-hdfs-journalnode.pid'}

[root@mpp3 ~]# find /var/lib/ambari-agent/data/ -type f -name "output-*.txt" -exec ls -lrtah {} \; | awk '{print $9}' | tail -1 |xargs cat | less | grep -i execute

2018-07-19 16:31:34,081 - Execute['/var/lib/ambari-agent/tmp/changeUid.sh ambari-qa /tmp/hadoop-ambari-qa,/tmp/hsperfdata_ambari-qa,/home/ambari-qa,/tmp/ambari-qa,/tmp/sqoop-ambari-qa 0'] {'not_if': '(test $(id -u ambari-qa) -gt 1000) || (false)'} 2018-07-19 16:31:34,088 - Skipping Execute['/var/lib/ambari-agent/tmp/changeUid.sh ambari-qa /tmp/hadoop-ambari-qa,/tmp/hsperfdata_ambari-qa,/home/ambari-qa,/tmp/ambari-qa,/tmp/sqoop-ambari-qa 0'] due to not_if 2018-07-19 16:31:34,102 - Execute['/var/lib/ambari-agent/tmp/changeUid.sh hbase /home/hbase,/tmp/hbase,/usr/bin/hbase,/var/log/hbase,/tmp/hbase-hbase 1016'] {'not_if': '(test $(id -u hbase) -gt 1000) || (false)'} 2018-07-19 16:31:34,108 - Skipping Execute['/var/lib/ambari-agent/tmp/changeUid.sh hbase /home/hbase,/tmp/hbase,/usr/bin/hbase,/var/log/hbase,/tmp/hbase-hbase 1016'] due to not_if 2018-07-19 16:31:34,142 - Execute[('setenforce', '0')] {'not_if': '(! which getenforce ) || (which getenforce && getenforce | grep -q Disabled)', 'sudo': True, 'only_if': 'test -f /selinux/enforce'} 2018-07-19 16:31:34,149 - Skipping Execute[('setenforce', '0')] due to not_if 2018-07-19 16:31:34,772 - HdfsResource['/ats/done'] {'security_enabled': False, 'hadoop_bin_dir': '/usr/hdp/current/hadoop-client/bin', 'keytab': [EMPTY], 'dfs_type': '', 'default_fs': 'hdfs://Microscopic', 'hdfs_resource_ignore_file': '/var/lib/ambari-agent/data/.hdfs_resource_ignore', 'hdfs_site': ..., 'kinit_path_local': 'kinit', 'principal_name': [EMPTY], 'user': 'hdfs', 'change_permissions_for_parents': True, 'owner': 'yarn', 'group': 'hadoop', 'hadoop_conf_dir': '/usr/hdp/current/hadoop-client/conf', 'type': 'directory', 'action': ['create_on_execute'], 'immutable_paths': [u'/hdfs/nm/app-logs', u'/apps/hive/warehouse', u'/mr-history/done', u'/tmp'], 'mode': 0755}

@Alex Goron, sorry man, my fault.

I forgot to mention, that you'll need to restart the zookeeper to see these execution commands. Cause, every time you restart a process Ambari-agent will generate an output file, so in your comment above, you were getting the output related to other components.
Then, after you made the restart, try to grab the command with that pipeline containing find /var/lib/ambari-agent and so on.
Hope this helps!

Contributor

@Vinicius Higa Murakami , I was trying to trace the logs the while starting zookeeper at the same time, however I see exactly the same old output from Jul-19. It did not write anything new even during this startup.

Hi @Alex Goron!
Hmm, did you try to run the following?

Ambari > Zookeeper > Service Actions (grey button on the right corner) > Run Service Check

Ambari > Zookeeper > Service Actions (grey button on the right corner) > Download Client Configs

And also, try to restart the zookeeper and go to the host and check the output/error files

ls -ltrah /var/lib/ambari-agent/data/

Hope this helps!

Contributor

@Vinicius Higa Murakami, thanks and sorry for delayed response. "Run Service Check" appears to be grayed out for Zookeeper, so was unable to run it. Client configs seemed fine and has all the changed I made in Ambari.

/var/lib/ambari-agent/data/ shows error files but they are all of 0 sizes and empty on all the nodes.

Thanks,

Alex

Hi @Alex Goron!
Hm, what about the last output file?
For example:

cat /var/lib/ambari-agent/data/output-1270.txt

It should at least show what's happening with your zookeeper log4j.

Hope this helps

Contributor

Thanks @Vinicius Higa Murakami, just checked and on all z nodes it showes the only mention of zookeeper log4j:

File['/usr/hdp/current/zookeeper-server/conf/log4j.properties'] {'content': InlineTemplate(...), 'owner': 'zookeeper', 'group': 'hadoop', 'mode': 0644}

Thanks,

Alex