Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

After restarting services again stoped repatedly but keep running on background

avatar
Explorer

I m trying to start services in ambari at a time start the all services but after few minutes again stop but still keep running on background as like

[root@D-9063 ~]# ps aux | grep kafka kafka 15868 0.7 4.7 5353180 385024 ? Sl 15:02 0:09 /usr/jdk64/jdk1.8.0_40/bin/java -Xmx1G -Xms1G -server -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSClassUnloadingEnabled -XX:+CMSScavengeBeforeRemark -XX:+DisableExplicitGC -Djava.awt.headless=true -Xloggc:/var/log/kafka/kafkaServer-gc.log -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintGCTimeStamps -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -Dkafka.logs.dir=/var/log/kafka -Dlog4j.configuration=file:/usr/hdp/2.3.4.7-4/kafka/bin/../config/log4j.properties -cp :/usr/hdp/2.3.4.7-4/kafka/bin/../libs/* kafka.Kafka /usr/hdp/2.3.4.7-4/kafka/config/server.properties root 22346 0.0 0.0 112648 952 pts/1 S+ 15:23 0:00 grep --color=auto kafka

and as like same another

please give me helpful answer.

1 ACCEPTED SOLUTION

avatar
Explorer

i have check if those services are running in background and still Ambari is showing them stopped?

Say for datanode, check these:

1. ps -ef | grep -i datanode

2. cat /var/run/hadoop/hdfs/hadoop-hdfs-datanode.pid

3. See if both id's are matching. If not, kill process, remove /var/run/hadoop/hdfs/hadoop-hdfs-datanode.pid, and start service from Ambari.

If above is not the case, check ambari-agent log for message:

  1. {'msg':'Unable to read structured output from /var/lib/ambari-agent/data/structured-out-status.json'}

If you are able to see this message,

stop ambari agent

move /var/lib/ambari-agent/data/structured-out-status.json to /tmp.

Start ambari agent.

View solution in original post

10 REPLIES 10

avatar
Super Guru

Hi @vishal patil

Can you please check if you have same pid under *.pid file each component which is having issue?

ls -l /var/run/*/*.pid

Do you see any error on Ambari UI under running tasks tab when restarting the service?

avatar
Explorer

Hi Jitendra ,

in zookeeper task tab get these logs

2016-05-31 15:55:00,868 - Directory['/var/lib/ambari-agent/data/tmp/AMBARI-artifacts/'] {'recursive': True}
2016-05-31 15:55:00,868 - File['/var/lib/ambari-agent/data/tmp/AMBARI-artifacts//jce_policy-8.zip'] {'content': DownloadSource('http://D-9063:8080/resources//jce_policy-8.zip')}
2016-05-31 15:55:00,869 - Not downloading the file from http://D-9063:8080/resources//jce_policy-8.zip, because /var/lib/ambari-agent/data/tmp/jce_policy-8.zip already exists
2016-05-31 15:55:00,869 - Group['spark'] {'ignore_failures': False}
2016-05-31 15:55:00,869 - Group['hadoop'] {'ignore_failures': False}
2016-05-31 15:55:00,869 - Group['users'] {'ignore_failures': False}
2016-05-31 15:55:00,870 - User['storm'] {'gid': 'hadoop', 'ignore_failures': False, 'groups': [u'hadoop']}
2016-05-31 15:55:00,870 - User['zookeeper'] {'gid': 'hadoop', 'ignore_failures': False, 'groups': [u'hadoop']}
2016-05-31 15:55:00,870 - User['spark'] {'gid': 'hadoop', 'ignore_failures': False, 'groups': [u'hadoop']}
2016-05-31 15:55:00,871 - User['ambari-qa'] {'gid': 'hadoop', 'ignore_failures': False, 'groups': [u'users']}
2016-05-31 15:55:00,871 - User['kafka'] {'gid': 'hadoop', 'ignore_failures': False, 'groups': [u'hadoop']}
2016-05-31 15:55:00,872 - User['hdfs'] {'gid': 'hadoop', 'ignore_failures': False, 'groups': [u'hadoop']}
2016-05-31 15:55:00,872 - User['yarn'] {'gid': 'hadoop', 'ignore_failures': False, 'groups': [u'hadoop']}
2016-05-31 15:55:00,873 - User['mapred'] {'gid': 'hadoop', 'ignore_failures': False, 'groups': [u'hadoop']}
2016-05-31 15:55:00,873 - User['hbase'] {'gid': 'hadoop', 'ignore_failures': False, 'groups': [u'hadoop']}
2016-05-31 15:55:00,873 - File['/var/lib/ambari-agent/data/tmp/changeUid.sh'] {'content': StaticFile('changeToSecureUid.sh'), 'mode': 0555}
2016-05-31 15:55:00,874 - Execute['/var/lib/ambari-agent/data/tmp/changeUid.sh ambari-qa /tmp/hadoop-ambari-qa,/tmp/hsperfdata_ambari-qa,/home/ambari-qa,/tmp/ambari-qa,/tmp/sqoop-ambari-qa'] {'not_if': '(test $(id -u ambari-qa) -gt 1000) || (false)'}
2016-05-31 15:55:00,878 - Skipping Execute['/var/lib/ambari-agent/data/tmp/changeUid.sh ambari-qa /tmp/hadoop-ambari-qa,/tmp/hsperfdata_ambari-qa,/home/ambari-qa,/tmp/ambari-qa,/tmp/sqoop-ambari-qa'] due to not_if
2016-05-31 15:55:00,878 - Directory['/tmp/hbase-hbase'] {'owner': 'hbase', 'recursive': True, 'mode': 0775, 'cd_access': 'a'}
2016-05-31 15:55:00,878 - File['/var/lib/ambari-agent/data/tmp/changeUid.sh'] {'content': StaticFile('changeToSecureUid.sh'), 'mode': 0555}
2016-05-31 15:55:00,879 - Execute['/var/lib/ambari-agent/data/tmp/changeUid.sh hbase /home/hbase,/tmp/hbase,/usr/bin/hbase,/var/log/hbase,/tmp/hbase-hbase'] {'not_if': '(test $(id -u hbase) -gt 1000) || (false)'}
2016-05-31 15:55:00,883 - Skipping Execute['/var/lib/ambari-agent/data/tmp/changeUid.sh hbase /home/hbase,/tmp/hbase,/usr/bin/hbase,/var/log/hbase,/tmp/hbase-hbase'] due to not_if
2016-05-31 15:55:00,883 - Group['hdfs'] {'ignore_failures': False}
2016-05-31 15:55:00,883 - User['hdfs'] {'ignore_failures': False, 'groups': [u'hadoop', u'hdfs']}
2016-05-31 15:55:01,077 - Skipping Execute['source /usr/hdp/current/zookeeper-server/conf/zookeeper-env.sh ; env ZOOCFGDIR=/usr/hdp/current/zookeeper-server/conf ZOOCFG=zoo.cfg /usr/hdp/current/zookeeper-server/bin/zkServer.sh start'] due to not_if

avatar
Super Guru

Hi @vishal patil

Are you trying to start or restart the service from Ambari UI?

avatar
Explorer

yes i m trying to restart the services from ambari ui

avatar
Super Guru

Try restarting ambari server/agent and see if this shows same issue.

avatar
Super Guru

avatar
Explorer

Hi sagar ,

I have already tried these step but problem still there not solve..

avatar

Hi @vishal patil

How much memory is allocated to your machine?

I'm curious to see what kind of GC times you have in /var/log/kafka/kafkaServer-gc.log - what CMS (Concurrent Mark Sweep) are you seeing?

avatar
Explorer
In my machine memory allocated------
Rack:
/default-rack
OS:
centos7 (x86_64)
Cores (CPU):
4 (4)
Disk:
Data Unavailable
Memory:
7.71GB
Load Avg:
Heartbeat:
a moment ago
Current Version:2.3.4.7-4

and

GC times you have in /var/log/kafka/kafkaServer-gc.log -

[root@D-9063 kafka]# tail -f kafkaServer-gc.log 2016-05-31T18:06:14.776+0530: 11031.181: [GC (Allocation Failure) 11031.197: [ParNew: 278203K->6721K(306688K), 0.8398941 secs] 278203K->6721K(1014528K), 0.8564962 secs] [Times: user=0.11 sys=0.00, real=0.86 secs] 2016-05-31T19:47:32.966+0530: 17109.333: [GC (Allocation Failure) 17109.333: [ParNew: 279361K->5532K(306688K), 0.0140049 secs] 279361K->5532K(1014528K), 0.0141290 secs] [Times: user=0.05 sys=0.00, real=0.02 secs]