Created 05-31-2016 09:54 AM
I m trying to start services in ambari at a time start the all services but after few minutes again stop but still keep running on background as like
[root@D-9063 ~]# ps aux | grep kafka kafka 15868 0.7 4.7 5353180 385024 ? Sl 15:02 0:09 /usr/jdk64/jdk1.8.0_40/bin/java -Xmx1G -Xms1G -server -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSClassUnloadingEnabled -XX:+CMSScavengeBeforeRemark -XX:+DisableExplicitGC -Djava.awt.headless=true -Xloggc:/var/log/kafka/kafkaServer-gc.log -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintGCTimeStamps -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -Dkafka.logs.dir=/var/log/kafka -Dlog4j.configuration=file:/usr/hdp/2.3.4.7-4/kafka/bin/../config/log4j.properties -cp :/usr/hdp/2.3.4.7-4/kafka/bin/../libs/* kafka.Kafka /usr/hdp/2.3.4.7-4/kafka/config/server.properties root 22346 0.0 0.0 112648 952 pts/1 S+ 15:23 0:00 grep --color=auto kafka
and as like same another
please give me helpful answer.
Created 06-17-2016 11:17 AM
i have check if those services are running in background and still Ambari is showing them stopped?
Say for datanode, check these:
1. ps -ef | grep -i datanode
2. cat /var/run/hadoop/hdfs/hadoop-hdfs-datanode.pid
3. See if both id's are matching. If not, kill process, remove /var/run/hadoop/hdfs/hadoop-hdfs-datanode.pid, and start service from Ambari.
If above is not the case, check ambari-agent log for message:
If you are able to see this message,
stop ambari agent
move /var/lib/ambari-agent/data/structured-out-status.json to /tmp.
Start ambari agent.
Created 05-31-2016 10:03 AM
Can you please check if you have same pid under *.pid file each component which is having issue?
ls -l /var/run/*/*.pid
Do you see any error on Ambari UI under running tasks tab when restarting the service?
Created 05-31-2016 10:28 AM
Hi Jitendra ,
in zookeeper task tab get these logs
2016-05-31 15:55:00,868 - Directory['/var/lib/ambari-agent/data/tmp/AMBARI-artifacts/'] {'recursive': True} 2016-05-31 15:55:00,868 - File['/var/lib/ambari-agent/data/tmp/AMBARI-artifacts//jce_policy-8.zip'] {'content': DownloadSource('http://D-9063:8080/resources//jce_policy-8.zip')} 2016-05-31 15:55:00,869 - Not downloading the file from http://D-9063:8080/resources//jce_policy-8.zip, because /var/lib/ambari-agent/data/tmp/jce_policy-8.zip already exists 2016-05-31 15:55:00,869 - Group['spark'] {'ignore_failures': False} 2016-05-31 15:55:00,869 - Group['hadoop'] {'ignore_failures': False} 2016-05-31 15:55:00,869 - Group['users'] {'ignore_failures': False} 2016-05-31 15:55:00,870 - User['storm'] {'gid': 'hadoop', 'ignore_failures': False, 'groups': [u'hadoop']} 2016-05-31 15:55:00,870 - User['zookeeper'] {'gid': 'hadoop', 'ignore_failures': False, 'groups': [u'hadoop']} 2016-05-31 15:55:00,870 - User['spark'] {'gid': 'hadoop', 'ignore_failures': False, 'groups': [u'hadoop']} 2016-05-31 15:55:00,871 - User['ambari-qa'] {'gid': 'hadoop', 'ignore_failures': False, 'groups': [u'users']} 2016-05-31 15:55:00,871 - User['kafka'] {'gid': 'hadoop', 'ignore_failures': False, 'groups': [u'hadoop']} 2016-05-31 15:55:00,872 - User['hdfs'] {'gid': 'hadoop', 'ignore_failures': False, 'groups': [u'hadoop']} 2016-05-31 15:55:00,872 - User['yarn'] {'gid': 'hadoop', 'ignore_failures': False, 'groups': [u'hadoop']} 2016-05-31 15:55:00,873 - User['mapred'] {'gid': 'hadoop', 'ignore_failures': False, 'groups': [u'hadoop']} 2016-05-31 15:55:00,873 - User['hbase'] {'gid': 'hadoop', 'ignore_failures': False, 'groups': [u'hadoop']} 2016-05-31 15:55:00,873 - File['/var/lib/ambari-agent/data/tmp/changeUid.sh'] {'content': StaticFile('changeToSecureUid.sh'), 'mode': 0555} 2016-05-31 15:55:00,874 - Execute['/var/lib/ambari-agent/data/tmp/changeUid.sh ambari-qa /tmp/hadoop-ambari-qa,/tmp/hsperfdata_ambari-qa,/home/ambari-qa,/tmp/ambari-qa,/tmp/sqoop-ambari-qa'] {'not_if': '(test $(id -u ambari-qa) -gt 1000) || (false)'} 2016-05-31 15:55:00,878 - Skipping Execute['/var/lib/ambari-agent/data/tmp/changeUid.sh ambari-qa /tmp/hadoop-ambari-qa,/tmp/hsperfdata_ambari-qa,/home/ambari-qa,/tmp/ambari-qa,/tmp/sqoop-ambari-qa'] due to not_if 2016-05-31 15:55:00,878 - Directory['/tmp/hbase-hbase'] {'owner': 'hbase', 'recursive': True, 'mode': 0775, 'cd_access': 'a'} 2016-05-31 15:55:00,878 - File['/var/lib/ambari-agent/data/tmp/changeUid.sh'] {'content': StaticFile('changeToSecureUid.sh'), 'mode': 0555} 2016-05-31 15:55:00,879 - Execute['/var/lib/ambari-agent/data/tmp/changeUid.sh hbase /home/hbase,/tmp/hbase,/usr/bin/hbase,/var/log/hbase,/tmp/hbase-hbase'] {'not_if': '(test $(id -u hbase) -gt 1000) || (false)'} 2016-05-31 15:55:00,883 - Skipping Execute['/var/lib/ambari-agent/data/tmp/changeUid.sh hbase /home/hbase,/tmp/hbase,/usr/bin/hbase,/var/log/hbase,/tmp/hbase-hbase'] due to not_if 2016-05-31 15:55:00,883 - Group['hdfs'] {'ignore_failures': False} 2016-05-31 15:55:00,883 - User['hdfs'] {'ignore_failures': False, 'groups': [u'hadoop', u'hdfs']} 2016-05-31 15:55:01,077 - Skipping Execute['source /usr/hdp/current/zookeeper-server/conf/zookeeper-env.sh ; env ZOOCFGDIR=/usr/hdp/current/zookeeper-server/conf ZOOCFG=zoo.cfg /usr/hdp/current/zookeeper-server/bin/zkServer.sh start'] due to not_if
Created 05-31-2016 10:49 AM
Are you trying to start or restart the service from Ambari UI?
Created 05-31-2016 11:53 AM
yes i m trying to restart the services from ambari ui
Created 05-31-2016 12:06 PM
Try restarting ambari server/agent and see if this shows same issue.
Created 05-31-2016 10:08 AM
In addition to what @Jitendra Yadav mentioned pls check this also -
Created 05-31-2016 10:20 AM
Hi sagar ,
I have already tried these step but problem still there not solve..
Created 05-31-2016 09:24 PM
How much memory is allocated to your machine?
I'm curious to see what kind of GC times you have in /var/log/kafka/kafkaServer-gc.log - what CMS (Concurrent Mark Sweep) are you seeing?
Created 06-01-2016 04:30 AM
and
GC times you have in /var/log/kafka/kafkaServer-gc.log -
[root@D-9063 kafka]# tail -f kafkaServer-gc.log 2016-05-31T18:06:14.776+0530: 11031.181: [GC (Allocation Failure) 11031.197: [ParNew: 278203K->6721K(306688K), 0.8398941 secs] 278203K->6721K(1014528K), 0.8564962 secs] [Times: user=0.11 sys=0.00, real=0.86 secs] 2016-05-31T19:47:32.966+0530: 17109.333: [GC (Allocation Failure) 17109.333: [ParNew: 279361K->5532K(306688K), 0.0140049 secs] 279361K->5532K(1014528K), 0.0141290 secs] [Times: user=0.05 sys=0.00, real=0.02 secs]