Support Questions

Find answers, ask questions, and share your expertise

After restarting services again stoped repatedly but keep running on background

avatar
Explorer

I m trying to start services in ambari at a time start the all services but after few minutes again stop but still keep running on background as like

[root@D-9063 ~]# ps aux | grep kafka kafka 15868 0.7 4.7 5353180 385024 ? Sl 15:02 0:09 /usr/jdk64/jdk1.8.0_40/bin/java -Xmx1G -Xms1G -server -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSClassUnloadingEnabled -XX:+CMSScavengeBeforeRemark -XX:+DisableExplicitGC -Djava.awt.headless=true -Xloggc:/var/log/kafka/kafkaServer-gc.log -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintGCTimeStamps -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -Dkafka.logs.dir=/var/log/kafka -Dlog4j.configuration=file:/usr/hdp/2.3.4.7-4/kafka/bin/../config/log4j.properties -cp :/usr/hdp/2.3.4.7-4/kafka/bin/../libs/* kafka.Kafka /usr/hdp/2.3.4.7-4/kafka/config/server.properties root 22346 0.0 0.0 112648 952 pts/1 S+ 15:23 0:00 grep --color=auto kafka

and as like same another

please give me helpful answer.

1 ACCEPTED SOLUTION

avatar
Explorer

i have check if those services are running in background and still Ambari is showing them stopped?

Say for datanode, check these:

1. ps -ef | grep -i datanode

2. cat /var/run/hadoop/hdfs/hadoop-hdfs-datanode.pid

3. See if both id's are matching. If not, kill process, remove /var/run/hadoop/hdfs/hadoop-hdfs-datanode.pid, and start service from Ambari.

If above is not the case, check ambari-agent log for message:

  1. {'msg':'Unable to read structured output from /var/lib/ambari-agent/data/structured-out-status.json'}

If you are able to see this message,

stop ambari agent

move /var/lib/ambari-agent/data/structured-out-status.json to /tmp.

Start ambari agent.

View solution in original post

10 REPLIES 10

avatar
Super Guru

Hi @vishal patil

Can you please check if you have same pid under *.pid file each component which is having issue?

ls -l /var/run/*/*.pid

Do you see any error on Ambari UI under running tasks tab when restarting the service?

avatar
Explorer

Hi Jitendra ,

in zookeeper task tab get these logs

2016-05-31 15:55:00,868 - Directory['/var/lib/ambari-agent/data/tmp/AMBARI-artifacts/'] {'recursive': True}
2016-05-31 15:55:00,868 - File['/var/lib/ambari-agent/data/tmp/AMBARI-artifacts//jce_policy-8.zip'] {'content': DownloadSource('http://D-9063:8080/resources//jce_policy-8.zip')}
2016-05-31 15:55:00,869 - Not downloading the file from http://D-9063:8080/resources//jce_policy-8.zip, because /var/lib/ambari-agent/data/tmp/jce_policy-8.zip already exists
2016-05-31 15:55:00,869 - Group['spark'] {'ignore_failures': False}
2016-05-31 15:55:00,869 - Group['hadoop'] {'ignore_failures': False}
2016-05-31 15:55:00,869 - Group['users'] {'ignore_failures': False}
2016-05-31 15:55:00,870 - User['storm'] {'gid': 'hadoop', 'ignore_failures': False, 'groups': [u'hadoop']}
2016-05-31 15:55:00,870 - User['zookeeper'] {'gid': 'hadoop', 'ignore_failures': False, 'groups': [u'hadoop']}
2016-05-31 15:55:00,870 - User['spark'] {'gid': 'hadoop', 'ignore_failures': False, 'groups': [u'hadoop']}
2016-05-31 15:55:00,871 - User['ambari-qa'] {'gid': 'hadoop', 'ignore_failures': False, 'groups': [u'users']}
2016-05-31 15:55:00,871 - User['kafka'] {'gid': 'hadoop', 'ignore_failures': False, 'groups': [u'hadoop']}
2016-05-31 15:55:00,872 - User['hdfs'] {'gid': 'hadoop', 'ignore_failures': False, 'groups': [u'hadoop']}
2016-05-31 15:55:00,872 - User['yarn'] {'gid': 'hadoop', 'ignore_failures': False, 'groups': [u'hadoop']}
2016-05-31 15:55:00,873 - User['mapred'] {'gid': 'hadoop', 'ignore_failures': False, 'groups': [u'hadoop']}
2016-05-31 15:55:00,873 - User['hbase'] {'gid': 'hadoop', 'ignore_failures': False, 'groups': [u'hadoop']}
2016-05-31 15:55:00,873 - File['/var/lib/ambari-agent/data/tmp/changeUid.sh'] {'content': StaticFile('changeToSecureUid.sh'), 'mode': 0555}
2016-05-31 15:55:00,874 - Execute['/var/lib/ambari-agent/data/tmp/changeUid.sh ambari-qa /tmp/hadoop-ambari-qa,/tmp/hsperfdata_ambari-qa,/home/ambari-qa,/tmp/ambari-qa,/tmp/sqoop-ambari-qa'] {'not_if': '(test $(id -u ambari-qa) -gt 1000) || (false)'}
2016-05-31 15:55:00,878 - Skipping Execute['/var/lib/ambari-agent/data/tmp/changeUid.sh ambari-qa /tmp/hadoop-ambari-qa,/tmp/hsperfdata_ambari-qa,/home/ambari-qa,/tmp/ambari-qa,/tmp/sqoop-ambari-qa'] due to not_if
2016-05-31 15:55:00,878 - Directory['/tmp/hbase-hbase'] {'owner': 'hbase', 'recursive': True, 'mode': 0775, 'cd_access': 'a'}
2016-05-31 15:55:00,878 - File['/var/lib/ambari-agent/data/tmp/changeUid.sh'] {'content': StaticFile('changeToSecureUid.sh'), 'mode': 0555}
2016-05-31 15:55:00,879 - Execute['/var/lib/ambari-agent/data/tmp/changeUid.sh hbase /home/hbase,/tmp/hbase,/usr/bin/hbase,/var/log/hbase,/tmp/hbase-hbase'] {'not_if': '(test $(id -u hbase) -gt 1000) || (false)'}
2016-05-31 15:55:00,883 - Skipping Execute['/var/lib/ambari-agent/data/tmp/changeUid.sh hbase /home/hbase,/tmp/hbase,/usr/bin/hbase,/var/log/hbase,/tmp/hbase-hbase'] due to not_if
2016-05-31 15:55:00,883 - Group['hdfs'] {'ignore_failures': False}
2016-05-31 15:55:00,883 - User['hdfs'] {'ignore_failures': False, 'groups': [u'hadoop', u'hdfs']}
2016-05-31 15:55:01,077 - Skipping Execute['source /usr/hdp/current/zookeeper-server/conf/zookeeper-env.sh ; env ZOOCFGDIR=/usr/hdp/current/zookeeper-server/conf ZOOCFG=zoo.cfg /usr/hdp/current/zookeeper-server/bin/zkServer.sh start'] due to not_if

avatar
Super Guru

Hi @vishal patil

Are you trying to start or restart the service from Ambari UI?

avatar
Explorer

yes i m trying to restart the services from ambari ui

avatar
Super Guru

Try restarting ambari server/agent and see if this shows same issue.

avatar
Super Guru

avatar
Explorer

Hi sagar ,

I have already tried these step but problem still there not solve..

avatar

Hi @vishal patil

How much memory is allocated to your machine?

I'm curious to see what kind of GC times you have in /var/log/kafka/kafkaServer-gc.log - what CMS (Concurrent Mark Sweep) are you seeing?

avatar
Explorer
In my machine memory allocated------
Rack:
/default-rack
OS:
centos7 (x86_64)
Cores (CPU):
4 (4)
Disk:
Data Unavailable
Memory:
7.71GB
Load Avg:
Heartbeat:
a moment ago
Current Version:2.3.4.7-4

and

GC times you have in /var/log/kafka/kafkaServer-gc.log -

[root@D-9063 kafka]# tail -f kafkaServer-gc.log 2016-05-31T18:06:14.776+0530: 11031.181: [GC (Allocation Failure) 11031.197: [ParNew: 278203K->6721K(306688K), 0.8398941 secs] 278203K->6721K(1014528K), 0.8564962 secs] [Times: user=0.11 sys=0.00, real=0.86 secs] 2016-05-31T19:47:32.966+0530: 17109.333: [GC (Allocation Failure) 17109.333: [ParNew: 279361K->5532K(306688K), 0.0140049 secs] 279361K->5532K(1014528K), 0.0141290 secs] [Times: user=0.05 sys=0.00, real=0.02 secs]