Support Questions
Find answers, ask questions, and share your expertise

Unable to start Kafka HDP 2.5 - kafka-logs invalid file ls -la ??????

Unable to start Kafka HDP 2.5 - kafka-logs invalid file ls -la ??????

New Contributor

Recently I have downloaded HDP 2.5 VM. After initial setup, I could see all the services (Including Kafka) up and running. Then I have restarted VM and login into Ambari. I see rest of the services are Up and Running but Kafka was not. When I tried to start manually from Service Action menu, I faced following error. When I tried ls -la command, I saw corrupted files. Due to this, Kafka is not able to up and running. Can someone please help me to resolve the same?

Output of /var/lib/ambari-agent/data/errors-415.txt

Traceback (most recent call last):
  File "/var/lib/ambari-agent/cache/common-services/KAFKA/0.8.1/package/scripts/kafka_broker.py", line 129, in <module>
    KafkaBroker().execute()
  File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", line 280, in execute
    method(env)
  File "/var/lib/ambari-agent/cache/common-services/KAFKA/0.8.1/package/scripts/kafka_broker.py", line 81, in start
    self.configure(env, upgrade_type=upgrade_type)
  File "/var/lib/ambari-agent/cache/common-services/KAFKA/0.8.1/package/scripts/kafka_broker.py", line 49, in configure
    kafka(upgrade_type=upgrade_type)
  File "/var/lib/ambari-agent/cache/common-services/KAFKA/0.8.1/package/scripts/kafka.py", line 117, in kafka
    recursive_ownership = True,
  File "/usr/lib/python2.6/site-packages/resource_management/core/base.py", line 155, in __init__
    self.env.run()
  File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 160, in run
    self.run_action(resource, action)
  File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 124, in run_action
    provider_action()
  File "/usr/lib/python2.6/site-packages/resource_management/core/providers/system.py", line 199, in action_create
    recursion_follow_links=self.resource.recursion_follow_links, safemode_folders=self.resource.safemode_folders)
  File "/usr/lib/python2.6/site-packages/resource_management/core/providers/system.py", line 73, in _ensure_metadata
    sudo.chown_recursive(path, _user_entity, _group_entity, recursion_follow_links)
  File "/usr/lib/python2.6/site-packages/resource_management/core/sudo.py", line 53, in chown_recursive
    os.lchown(os.path.join(root, name), uid, gid)
OSError: [Errno 2] No such file or directory: '/kafka-logs/ATLAS_ENTITIES-0/00000000000000000000.index'

Output of /var/lib/ambari-agent/data/output-415.txt

2016-10-03 19:11:01,056 - The hadoop conf dir /usr/hdp/current/hadoop-client/conf exists, will call conf-select on it for version 2.5.0.0-1245
2016-10-03 19:11:01,056 - Checking if need to create versioned conf dir /etc/hadoop/2.5.0.0-1245/0
2016-10-03 19:11:01,058 - call[('ambari-python-wrap', '/usr/bin/conf-select', 'create-conf-dir', '--package', 'hadoop', '--stack-version', '2.5.0.0-1245', '--conf-version', '0')] {'logoutput': False, 'sudo': True, 'quiet': False, 'stderr': -1}
2016-10-03 19:11:01,173 - call returned (1, '/etc/hadoop/2.5.0.0-1245/0 exist already', '')
2016-10-03 19:11:01,173 - checked_call[('ambari-python-wrap', '/usr/bin/conf-select', 'set-conf-dir', '--package', 'hadoop', '--stack-version', '2.5.0.0-1245', '--conf-version', '0')] {'logoutput': False, 'sudo': True, 'quiet': False}
2016-10-03 19:11:01,237 - checked_call returned (0, '')
2016-10-03 19:11:01,238 - Ensuring that hadoop has the correct symlink structure
2016-10-03 19:11:01,238 - Using hadoop conf dir: /usr/hdp/current/hadoop-client/conf
2016-10-03 19:11:01,458 - The hadoop conf dir /usr/hdp/current/hadoop-client/conf exists, will call conf-select on it for version 2.5.0.0-1245
2016-10-03 19:11:01,458 - Checking if need to create versioned conf dir /etc/hadoop/2.5.0.0-1245/0
2016-10-03 19:11:01,459 - call[('ambari-python-wrap', '/usr/bin/conf-select', 'create-conf-dir', '--package', 'hadoop', '--stack-version', '2.5.0.0-1245', '--conf-version', '0')] {'logoutput': False, 'sudo': True, 'quiet': False, 'stderr': -1}
2016-10-03 19:11:01,526 - call returned (1, '/etc/hadoop/2.5.0.0-1245/0 exist already', '')
2016-10-03 19:11:01,527 - checked_call[('ambari-python-wrap', '/usr/bin/conf-select', 'set-conf-dir', '--package', 'hadoop', '--stack-version', '2.5.0.0-1245', '--conf-version', '0')] {'logoutput': False, 'sudo': True, 'quiet': False}
2016-10-03 19:11:01,817 - checked_call returned (0, '')
2016-10-03 19:11:01,818 - Ensuring that hadoop has the correct symlink structure
2016-10-03 19:11:01,818 - Using hadoop conf dir: /usr/hdp/current/hadoop-client/conf
2016-10-03 19:11:01,820 - Group['hadoop'] {}
2016-10-03 19:11:01,841 - Group['users'] {}
2016-10-03 19:11:01,841 - Group['zeppelin'] {}
2016-10-03 19:11:01,841 - Group['knox'] {}
2016-10-03 19:11:01,842 - Group['ranger'] {}
2016-10-03 19:11:01,842 - Group['spark'] {}
2016-10-03 19:11:01,842 - Group['livy'] {}
2016-10-03 19:11:01,842 - User['oozie'] {'gid': 'hadoop', 'fetch_nonlocal_groups': True, 'groups': ['users']}
2016-10-03 19:11:01,844 - User['hive'] {'gid': 'hadoop', 'fetch_nonlocal_groups': True, 'groups': ['hadoop']}
2016-10-03 19:11:01,845 - User['zeppelin'] {'gid': 'hadoop', 'fetch_nonlocal_groups': True, 'groups': ['hadoop']}
2016-10-03 19:11:01,845 - User['ambari-qa'] {'gid': 'hadoop', 'fetch_nonlocal_groups': True, 'groups': ['users']}
2016-10-03 19:11:01,846 - User['flume'] {'gid': 'hadoop', 'fetch_nonlocal_groups': True, 'groups': ['hadoop']}
2016-10-03 19:11:01,847 - User['hdfs'] {'gid': 'hadoop', 'fetch_nonlocal_groups': True, 'groups': ['hadoop']}
2016-10-03 19:11:01,848 - User['knox'] {'gid': 'hadoop', 'fetch_nonlocal_groups': True, 'groups': ['hadoop']}
2016-10-03 19:11:01,848 - User['ranger'] {'gid': 'hadoop', 'fetch_nonlocal_groups': True, 'groups': ['ranger']}
2016-10-03 19:11:01,849 - User['infra-solr'] {'gid': 'hadoop', 'fetch_nonlocal_groups': True, 'groups': ['hadoop']}
2016-10-03 19:11:01,850 - User['storm'] {'gid': 'hadoop', 'fetch_nonlocal_groups': True, 'groups': ['hadoop']}
2016-10-03 19:11:01,851 - User['spark'] {'gid': 'hadoop', 'fetch_nonlocal_groups': True, 'groups': ['hadoop']}
2016-10-03 19:11:01,852 - User['livy'] {'gid': 'hadoop', 'fetch_nonlocal_groups': True, 'groups': ['hadoop']}
2016-10-03 19:11:01,852 - User['mapred'] {'gid': 'hadoop', 'fetch_nonlocal_groups': True, 'groups': ['hadoop']}
2016-10-03 19:11:01,853 - User['hbase'] {'gid': 'hadoop', 'fetch_nonlocal_groups': True, 'groups': ['hadoop']}
2016-10-03 19:11:01,854 - User['tez'] {'gid': 'hadoop', 'fetch_nonlocal_groups': True, 'groups': ['users']}
2016-10-03 19:11:01,855 - User['zookeeper'] {'gid': 'hadoop', 'fetch_nonlocal_groups': True, 'groups': ['hadoop']}
2016-10-03 19:11:01,856 - User['kafka'] {'gid': 'hadoop', 'fetch_nonlocal_groups': True, 'groups': ['hadoop']}
2016-10-03 19:11:01,856 - User['falcon'] {'gid': 'hadoop', 'fetch_nonlocal_groups': True, 'groups': ['users']}
2016-10-03 19:11:01,857 - User['sqoop'] {'gid': 'hadoop', 'fetch_nonlocal_groups': True, 'groups': ['hadoop']}
2016-10-03 19:11:01,858 - User['yarn'] {'gid': 'hadoop', 'fetch_nonlocal_groups': True, 'groups': ['hadoop']}
2016-10-03 19:11:01,859 - User['hcat'] {'gid': 'hadoop', 'fetch_nonlocal_groups': True, 'groups': ['hadoop']}
2016-10-03 19:11:01,859 - User['ams'] {'gid': 'hadoop', 'fetch_nonlocal_groups': True, 'groups': ['hadoop']}
2016-10-03 19:11:01,860 - User['atlas'] {'gid': 'hadoop', 'fetch_nonlocal_groups': True, 'groups': ['hadoop']}
2016-10-03 19:11:01,861 - File['/var/lib/ambari-agent/tmp/changeUid.sh'] {'content': StaticFile('changeToSecureUid.sh'), 'mode': 0555}
2016-10-03 19:11:02,348 - Execute['/var/lib/ambari-agent/tmp/changeUid.sh ambari-qa /tmp/hadoop-ambari-qa,/tmp/hsperfdata_ambari-qa,/home/ambari-qa,/tmp/ambari-qa,/tmp/sqoop-ambari-qa'] {'not_if': '(test $(id -u ambari-qa) -gt 1000) || (false)'}
2016-10-03 19:11:02,895 - Skipping Execute['/var/lib/ambari-agent/tmp/changeUid.sh ambari-qa /tmp/hadoop-ambari-qa,/tmp/hsperfdata_ambari-qa,/home/ambari-qa,/tmp/ambari-qa,/tmp/sqoop-ambari-qa'] due to not_if
2016-10-03 19:11:02,895 - Directory['/tmp/hbase-hbase'] {'owner': 'hbase', 'create_parents': True, 'mode': 0775, 'cd_access': 'a'}
2016-10-03 19:11:02,911 - File['/var/lib/ambari-agent/tmp/changeUid.sh'] {'content': StaticFile('changeToSecureUid.sh'), 'mode': 0555}
2016-10-03 19:11:02,913 - Execute['/var/lib/ambari-agent/tmp/changeUid.sh hbase /home/hbase,/tmp/hbase,/usr/bin/hbase,/var/log/hbase,/tmp/hbase-hbase'] {'not_if': '(test $(id -u hbase) -gt 1000) || (false)'}
2016-10-03 19:11:02,929 - Skipping Execute['/var/lib/ambari-agent/tmp/changeUid.sh hbase /home/hbase,/tmp/hbase,/usr/bin/hbase,/var/log/hbase,/tmp/hbase-hbase'] due to not_if
2016-10-03 19:11:02,930 - Group['hdfs'] {}
2016-10-03 19:11:02,930 - User['hdfs'] {'fetch_nonlocal_groups': True, 'groups': ['hadoop', 'hdfs']}
2016-10-03 19:11:02,931 - FS Type: 
2016-10-03 19:11:02,931 - Directory['/etc/hadoop'] {'mode': 0755}
2016-10-03 19:11:02,955 - File['/usr/hdp/current/hadoop-client/conf/hadoop-env.sh'] {'content': InlineTemplate(...), 'owner': 'hdfs', 'group': 'hadoop'}
2016-10-03 19:11:02,982 - Directory['/var/lib/ambari-agent/tmp/hadoop_java_io_tmpdir'] {'owner': 'hdfs', 'group': 'hadoop', 'mode': 01777}
2016-10-03 19:11:03,008 - Execute[('setenforce', '0')] {'not_if': '(! which getenforce ) || (which getenforce && getenforce | grep -q Disabled)', 'sudo': True, 'only_if': 'test -f /selinux/enforce'}
2016-10-03 19:11:03,163 - Skipping Execute[('setenforce', '0')] due to not_if
2016-10-03 19:11:03,163 - Directory['/var/log/hadoop'] {'owner': 'root', 'create_parents': True, 'group': 'hadoop', 'mode': 0775, 'cd_access': 'a'}
2016-10-03 19:11:03,166 - Directory['/var/run/hadoop'] {'owner': 'root', 'create_parents': True, 'group': 'root', 'cd_access': 'a'}
2016-10-03 19:11:03,166 - Directory['/tmp/hadoop-hdfs'] {'owner': 'hdfs', 'create_parents': True, 'cd_access': 'a'}
2016-10-03 19:11:03,239 - File['/usr/hdp/current/hadoop-client/conf/commons-logging.properties'] {'content': Template('commons-logging.properties.j2'), 'owner': 'hdfs'}
2016-10-03 19:11:03,276 - File['/usr/hdp/current/hadoop-client/conf/health_check'] {'content': Template('health_check.j2'), 'owner': 'hdfs'}
2016-10-03 19:11:03,292 - File['/usr/hdp/current/hadoop-client/conf/log4j.properties'] {'content': ..., 'owner': 'hdfs', 'group': 'hadoop', 'mode': 0644}
2016-10-03 19:11:03,325 - File['/usr/hdp/current/hadoop-client/conf/hadoop-metrics2.properties'] {'content': Template('hadoop-metrics2.properties.j2'), 'owner': 'hdfs', 'group': 'hadoop'}
2016-10-03 19:11:03,340 - File['/usr/hdp/current/hadoop-client/conf/task-log4j.properties'] {'content': StaticFile('task-log4j.properties'), 'mode': 0755}
2016-10-03 19:11:03,399 - File['/usr/hdp/current/hadoop-client/conf/configuration.xsl'] {'owner': 'hdfs', 'group': 'hadoop'}
2016-10-03 19:11:03,405 - File['/etc/hadoop/conf/topology_mappings.data'] {'owner': 'hdfs', 'content': Template('topology_mappings.data.j2'), 'only_if': 'test -d /etc/hadoop/conf', 'group': 'hadoop'}
2016-10-03 19:11:03,417 - File['/etc/hadoop/conf/topology_script.py'] {'content': StaticFile('topology_script.py'), 'only_if': 'test -d /etc/hadoop/conf', 'mode': 0755}
2016-10-03 19:11:03,684 - Stack Feature Version Info: stack_version=2.5, version=2.5.0.0-1245, current_cluster_version=2.5.0.0-1245 -> 2.5.0.0-1245
2016-10-03 19:11:03,687 - call['ambari-python-wrap /usr/bin/hdp-select status kafka-broker'] {'timeout': 20}
2016-10-03 19:11:03,780 - call returned (0, 'kafka-broker - 2.5.0.0-1245')
2016-10-03 19:11:03,782 - The hadoop conf dir /usr/hdp/current/hadoop-client/conf exists, will call conf-select on it for version 2.5.0.0-1245
2016-10-03 19:11:03,782 - Checking if need to create versioned conf dir /etc/hadoop/2.5.0.0-1245/0
2016-10-03 19:11:03,783 - call[('ambari-python-wrap', '/usr/bin/conf-select', 'create-conf-dir', '--package', 'hadoop', '--stack-version', '2.5.0.0-1245', '--conf-version', '0')] {'logoutput': False, 'sudo': True, 'quiet': False, 'stderr': -1}
2016-10-03 19:11:03,834 - call returned (1, '/etc/hadoop/2.5.0.0-1245/0 exist already', '')
2016-10-03 19:11:03,835 - checked_call[('ambari-python-wrap', '/usr/bin/conf-select', 'set-conf-dir', '--package', 'hadoop', '--stack-version', '2.5.0.0-1245', '--conf-version', '0')] {'logoutput': False, 'sudo': True, 'quiet': False}
2016-10-03 19:11:03,875 - checked_call returned (0, '')
2016-10-03 19:11:03,876 - Ensuring that hadoop has the correct symlink structure
2016-10-03 19:11:03,876 - Using hadoop conf dir: /usr/hdp/current/hadoop-client/conf
2016-10-03 19:11:03,880 - Directory['/var/log/kafka'] {'group': 'hadoop', 'cd_access': 'a', 'create_parents': True, 'recursive_ownership': True, 'owner': 'kafka', 'mode': 0755}
2016-10-03 19:11:03,961 - Directory['/var/run/kafka'] {'group': 'hadoop', 'cd_access': 'a', 'create_parents': True, 'recursive_ownership': True, 'owner': 'kafka', 'mode': 0755}
2016-10-03 19:11:03,962 - Directory['/usr/hdp/current/kafka-broker/config'] {'group': 'hadoop', 'cd_access': 'a', 'create_parents': True, 'mode': 0755, 'owner': 'kafka', 'recursive_ownership': True}
2016-10-03 19:11:03,973 - Effective stack version: 2.5.0.0
2016-10-03 19:11:03,974 - Kafka listeners: PLAINTEXT://sandbox.hortonworks.com:6667
2016-10-03 19:11:03,975 - Directory['/kafka-logs'] {'group': 'hadoop', 'cd_access': 'a', 'create_parents': True, 'mode': 0755, 'owner': 'kafka', 'recursive_ownership': True}

Command failed after 1 tries


8230-01-kafka-log-corrupted-files.png

8231-02-ambari-stack.png

Regards

Sumit Udani

6 REPLIES 6

Re: Unable to start Kafka HDP 2.5 - kafka-logs invalid file ls -la ??????

Mentor

When you shut down VM, please use command "shutdown now -h"

Re: Unable to start Kafka HDP 2.5 - kafka-logs invalid file ls -la ??????

New Contributor

Hi,

Thanks for the help.

Yes, I used shutdown now -h command. To double check, i had create a new VM and after restart faced same error. I will give it a try one more time.

Re: Unable to start Kafka HDP 2.5 - kafka-logs invalid file ls -la ??????

Cloudera Employee

You can try the following work around:

In ambari config, Kafka -> advanced - change the reference for kafka-logs to kafka-logs2

From a putty session as root:

#mkdir kafka-logs2

#chown kafka:hadoop kafka-logs2/

Re: Unable to start Kafka HDP 2.5 - kafka-logs invalid file ls -la ??????

Explorer

This worked for me.

In the directory ATLAS_ENTITIES-0 create missing files.

# touch 00000000000000000000.index

# chown kafka:hadoop 00000000000000000000.index

# touch 00000000000000000000.log.deleted

# chown kafka:hadoop 00000000000000000000.log.deleted

# touch 00000000000000000000.log

# chown kafka:hadoop 00000000000000000000.log

Follow the same steps in the ATLAS_HOOK-0 directory. Now you should be able to start the Kafka process.

/Artur

Re: Unable to start Kafka HDP 2.5 - kafka-logs invalid file ls -la ??????

This still seems to be an issue with the 2.6 Docker Sandbox. I did as you suggested (shown below) and was able to get everything running -- THANKS!

[root@sandbox ~]# cd /kafka-logs/
[root@sandbox kafka-logs]# cd ATLAS_HOOK-0/
[root@sandbox ATLAS_HOOK-0]# ls -l
ls: cannot access 00000000000000000000.index: No such file or directory
ls: cannot access 00000000000000000000.log: No such file or directory
ls: cannot access 00000000000000000000.timeindex: No such file or directory
total 0
??????????? ? ?     ?      ?            ? 00000000000000000000.index
??????????? ? ?     ?      ?            ? 00000000000000000000.log
??????????? ? ?     ?      ?            ? 00000000000000000000.timeindex
-rw-r--r--. 1 kafka hadoop 0 Jun  2 17:45 00000000000000000027.index
-rw-r--r--. 1 kafka hadoop 0 Jun  2 17:10 00000000000000000027.log
-rw-r--r--. 1 kafka hadoop 0 Jun  2 17:45 00000000000000000027.timeindex
[root@sandbox ATLAS_HOOK-0]# touch 00000000000000000000.index
[root@sandbox ATLAS_HOOK-0]# chown kafka:hadoop 00000000000000000000.index
[root@sandbox ATLAS_HOOK-0]# touch 00000000000000000000.log
[root@sandbox ATLAS_HOOK-0]# chown kafka:hadoop 00000000000000000000.log
[root@sandbox ATLAS_HOOK-0]# touch 00000000000000000000.timeindex
[root@sandbox ATLAS_HOOK-0]# chown kafka:hadoop 00000000000000000000.timeindex
[root@sandbox ATLAS_HOOK-0]# ls -l
total 0
-rw-r--r--. 1 kafka hadoop 0 Jun  2 18:50 00000000000000000000.index
-rw-r--r--. 1 kafka hadoop 0 Jun  2 18:51 00000000000000000000.log
-rw-r--r--. 1 kafka hadoop 0 Jun  2 18:51 00000000000000000000.timeindex
-rw-r--r--. 1 kafka hadoop 0 Jun  2 17:45 00000000000000000027.index
-rw-r--r--. 1 kafka hadoop 0 Jun  2 17:10 00000000000000000027.log
-rw-r--r--. 1 kafka hadoop 0 Jun  2 17:45 00000000000000000027.timeindex
[root@sandbox ATLAS_HOOK-0]# 

Re: Unable to start Kafka HDP 2.5 - kafka-logs invalid file ls -la ??????

A follow-up to this one, at least from my perspective. It seems that I found a bug in Kafka in this scenario. For my console producer script I was using the hostname of localhost instead of sandbox.hortonworks.com which listens on a different IP. This, according to @Edgar Orendain is what caused my log/timeindex/index files to become corrupted and I was able to validate with a fresh 2.6 Docker Sandbox that all worked fine when I started using that full hostname (it failed twice the same way when I started out using localhost). Edgar said he will file a bug on this one as that's a bit crazy to cause so much trouble under this scenario. Thanks for all the help on this one, Edgar!!