Member since
07-26-2018
25
Posts
1
Kudos Received
1
Solution
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1912 | 11-02-2018 01:06 AM |
03-20-2019
06:13 PM
@Felix Albani , I'm currently on HDP 2.6.5 and would like to install or even preferably upgrade to Hive 2.1 . Do you happen to know any documentation on how to perform the install/upgrade? Unfortunately I can not see Hive 2.1 in the list of services that I can add.
... View more
11-02-2018
01:06 AM
After doing some more research on the absence of valid TGT, I found that the issue was really in default_ccache_name set to KEYRING:persistent:%{uid} in krb5.conf . I realized that I'm hitting this specific issue while reading this thread. For whatever reason Hadoop has an issue with the KEYRING. So setting default_ccache_name to FILE has resolved this issue and appropriate TGT are being provided now and NameNode does not take that much time to start and does not fail anymore. My updated parameter looks like like this: default_ccache_name=FILE:/tmp/krb5cc_%{uid} I have also propagated the config file throughout the cluster.
... View more
10-29-2018
03:19 AM
Just wanted to add a couple of notes to the above.. I have just installed Zeppelin Noted to one of the cluster nodes. After the installation I noticed there is a need to restart NameNode, Secondary NameNode and MapReduce2. NameNode was restarting for 30 minutes with exactly the same symptoms as in the above log, but this time it failed. I'm still digging and trying to understand why it is happening, but do have a couple of questions in the meantime: 1. Why there is a need to restart these services after Zeppelin Notes installation. Not sure if I follow what these dependencies are. 2. What could be a reason that TGT is not found?
... View more
10-28-2018
08:41 PM
I have just enabled kerberos on the hadoop cluster. The whole process went fairly smooth. However after I needed to restart all the services I noticed that it over 30 min for the NameNode to start up. During these 30 min it seems that hdfs did not have a valid TGT based on the messages below. After patiently waiting and thinking it is going to fail any moment it in fact DID come up. My question is why it took so long and what was the problem of obtaining TGT from the very beginning? 2018-10-27 23:28:54,899 - Retrying after 10 seconds. Reason: Execution of '/usr/hdp/current/hadoop-hdfs-namenode/bin/hdfs dfsadmin -fs hdfs://*******:8020 -safemode get | grep 'Safe mode is OFF'' returned 1. 18/10/27 23:28:54 WARN ipc.Client: Exception encountered while connecting to the server : javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]
safemode: Failed on local exception: java.io.IOException: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]; Host Details : local host is: "*******/11*.11*.11*.11*"; destination host is: "***.***.***":8020;
18/10/27 23:29:09 WARN ipc.Client: Exception encountered while connecting to the server : javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)] safemode: Failed on local exception: java.io.IOException: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]; Host Details : local host is: "*******/11*.11*.11*.11*"; destination host is: "************":8020;
... View more
Labels:
- Labels:
-
Apache Hadoop
10-26-2018
10:55 PM
@Geoffrey Shelton Okot , I would love to do so, but I can not see that "Accept" button ... Alex
... View more
10-26-2018
02:49 PM
@Jay Kumar SenSharma, Thanks a lot for helping me! Aparently I ran out of inodes. Not sure why it did not occur to me to check it first place ... Anyways reformatting the filesystem and a little bit of file shuffling exercise did the trick 🙂
... View more
10-26-2018
02:42 PM
@Geoffrey Shelton Okot , the official documentation does not list the steps of installing kerberos clients and propagating krb5.conf to all the nodes. Does this mean Ambari tool will propagate krb5.conf and install krb5-workstation for me? I know using Cloudera Manager I have to set up clients as well which makes absolutely perfect sense. I just wanted to know for sure before I execute the wizard.
... View more
10-23-2018
10:44 PM
I'm trying to start the metrics collector , but getting a strange error instead: resource_management.core.exceptions.ExecutionFailed: Execution of '/usr/sbin/ambari-metrics-collector --config /etc/ambari-metrics-collector/conf start' returned 1. Tue Oct 23 18:05:37 EDT 2018 Starting HBase.
starting master, logging to /disks/disk1/log/ambari-metrics-collector/hbase-ams-master-node4.hdp.com.out
/usr/lib/ams-hbase/bin/hbase-daemon.sh: line 189: /disks/disk1/log/ambari-metrics-collector/hbase-ams-master-node4.hdp.com.out: No space left on device
head: cannot open ‘/disks/disk1/log/ambari-metrics-collector/hbase-ams-master-node4.hdp.com.out’ for reading: No such file or directory
/usr/sbin/ambari-metrics-collector: line 81: /disks/disk1/run/ambari-metrics-collector/ambari-metrics-collector.pid: No space left on device
ERROR: Cannot write pid /disks/disk1/run/ambari-metrics-collector/ambari-metrics-collector.pid. it is complaining there is no space on device. /disks/disk1/log/ambari-metrics-collector/ambari-metrics-collector.out shows the same thing: Java HotSpot(TM) 64-Bit Server VM warning: Cannot open file /disks/disk1/log/ambari-metrics-collector/collector-gc.log-201810231817 due to No space left on device log4j:ERROR setFile(null,true) call failed. java.io.FileNotFoundException: /disks/disk1/log/ambari-metrics-collector/ambari-metrics-collector.log (No space left on device) ...... but # df -h /disks/disk1/ Filesystem Size Used Avail Use% Mounted on /dev/sdb1 4.9G 760M 4.2G 16% /disks/disk1 There is clearly some space there. How much space is really needed to write the output file? Thanks, Alex
... View more
Labels:
- Labels:
-
Apache Ambari
10-21-2018
04:22 AM
@David Schorow , do you know if this feature has been finally implemented?
... View more
09-04-2018
06:39 PM
Thanks @Vinicius Higa Murakami, just checked and on all z nodes it showes the only mention of zookeeper log4j: File['/usr/hdp/current/zookeeper-server/conf/log4j.properties'] {'content': InlineTemplate(...), 'owner': 'zookeeper', 'group': 'hadoop', 'mode': 0644} Thanks, Alex
... View more
09-04-2018
02:52 PM
@Vinicius Higa Murakami, thanks and sorry for delayed response. "Run Service Check" appears to be grayed out for Zookeeper, so was unable to run it. Client configs seemed fine and has all the changed I made in Ambari. /var/lib/ambari-agent/data/ shows error files but they are all of 0 sizes and empty on all the nodes. Thanks, Alex
... View more
08-29-2018
01:08 AM
1 Kudo
I'm trying to better understand the idea of HDFS encryption art rest. Suppose I have deployed encryption by wire, I have a kerberzied cluster and I'm already using Self Encrypted Drives (SED). What am I loosing by not configuring HDFS encryption at rest? I know HDFS encryption will put a toll (may be smaller in the best case, but still a toll) on performance, so trying to see what am I loosing by just having encryption at the hardware (disk) level and encryption for data in transit. Thanks, Alex
... View more
Labels:
08-16-2018
09:50 PM
@Vinicius Higa Murakami , I was trying to trace the logs the while starting zookeeper at the same time, however I see exactly the same old output from Jul-19. It did not write anything new even during this startup.
... View more
08-07-2018
03:51 PM
@Vinicius Higa Murakami , no mention of any zookeeper execution on any node (even on the successful one). Successful Node: [root@mpp2 ~]# find /var/lib/ambari-agent/data/ -type f -name "output-*.txt" -exec ls -lrtah {} \; | awk '{print $9}' | tail -1 |xargs cat 2018-07-19 14:21:25,500 - Using hadoop conf dir: /usr/hdp/current/hadoop-client/conf
2018-07-19 14:21:25,503 - checked_call['hostid'] {}
2018-07-19 14:21:25,511 - checked_call returned (0, '7d0aa000')
2018-07-19 14:21:25,514 - Execute['/usr/sbin/ambari-metrics-monitor --config /etc/ambari-metrics-monitor/conf stop'] {'user': 'ams'}
2018-07-19 14:21:30,612 - Waiting for actual component stop
2018-07-19 14:21:30,719 - Process with pid 13997 is not running. Stale pid file at /var/run/ambari-metrics-monitor/ambari-metrics-monitor.pid Failing Nodes: [root@mpp1 ~]# find /var/lib/ambari-agent/data/ -type f -name "output-*.txt" -exec ls -lrtah {} \; | awk '{print $9}' | tail -1 |xargs cat | less | grep -i execute 2018-07-19 16:27:58,587 - Execute['/var/lib/ambari-agent/tmp/changeUid.sh ambari-qa /tmp/hadoop-ambari-qa,/tmp/hsperfdata_ambari-qa,/home/ambari-qa,/tmp/ambari-qa,/tmp/sqoop-ambari-qa 0'] {'not_if': '(test $(id -u ambari-qa) -gt 1000) || (false)'}
2018-07-19 16:27:58,593 - Skipping Execute['/var/lib/ambari-agent/tmp/changeUid.sh ambari-qa /tmp/hadoop-ambari-qa,/tmp/hsperfdata_ambari-qa,/home/ambari-qa,/tmp/ambari-qa,/tmp/sqoop-ambari-qa 0'] due to not_if
2018-07-19 16:27:58,607 - Execute['/var/lib/ambari-agent/tmp/changeUid.sh hbase /home/hbase,/tmp/hbase,/usr/bin/hbase,/var/log/hbase,/tmp/hbase-hbase 1014'] {'not_if': '(test $(id -u hbase) -gt 1000) || (false)'}
2018-07-19 16:27:58,612 - Skipping Execute['/var/lib/ambari-agent/tmp/changeUid.sh hbase /home/hbase,/tmp/hbase,/usr/bin/hbase,/var/log/hbase,/tmp/hbase-hbase 1014'] due to not_if
2018-07-19 16:27:58,652 - Execute[('setenforce', '0')] {'not_if': '(! which getenforce ) || (which getenforce && getenforce | grep -q Disabled)', 'sudo': True, 'only_if': 'test -f /selinux/enforce'}
2018-07-19 16:27:58,661 - Skipping Execute[('setenforce', '0')] due to not_if
2018-07-19 16:27:59,207 - Execute['ambari-sudo.sh su hdfs -l -s /bin/bash -c 'ulimit -c unlimited ; /usr/hdp/current/hadoop-client/sbin/hadoop-daemon.sh --config /usr/hdp/current/hadoop-client/conf start journalnode''] {'environment': {'HADOOP_LIBEXEC_DIR': '/usr/hdp/current/hadoop-client/libexec'}, 'not_if': 'ambari-sudo.sh -H -E test -f /var/run/hadoop/hdfs/hadoop-hdfs-journalnode.pid && ambari-sudo.sh -H -E pgrep -F /var/run/hadoop/hdfs/hadoop-hdfs-journalnode.pid'} [root@mpp3 ~]# find /var/lib/ambari-agent/data/ -type f -name "output-*.txt" -exec ls -lrtah {} \; | awk '{print $9}' | tail -1 |xargs cat | less | grep -i execute 2018-07-19 16:31:34,081 - Execute['/var/lib/ambari-agent/tmp/changeUid.sh ambari-qa /tmp/hadoop-ambari-qa,/tmp/hsperfdata_ambari-qa,/home/ambari-qa,/tmp/ambari-qa,/tmp/sqoop-ambari-qa 0'] {'not_if': '(test $(id -u ambari-qa) -gt 1000) || (false)'}
2018-07-19 16:31:34,088 - Skipping Execute['/var/lib/ambari-agent/tmp/changeUid.sh ambari-qa /tmp/hadoop-ambari-qa,/tmp/hsperfdata_ambari-qa,/home/ambari-qa,/tmp/ambari-qa,/tmp/sqoop-ambari-qa 0'] due to not_if
2018-07-19 16:31:34,102 - Execute['/var/lib/ambari-agent/tmp/changeUid.sh hbase /home/hbase,/tmp/hbase,/usr/bin/hbase,/var/log/hbase,/tmp/hbase-hbase 1016'] {'not_if': '(test $(id -u hbase) -gt 1000) || (false)'}
2018-07-19 16:31:34,108 - Skipping Execute['/var/lib/ambari-agent/tmp/changeUid.sh hbase /home/hbase,/tmp/hbase,/usr/bin/hbase,/var/log/hbase,/tmp/hbase-hbase 1016'] due to not_if
2018-07-19 16:31:34,142 - Execute[('setenforce', '0')] {'not_if': '(! which getenforce ) || (which getenforce && getenforce | grep -q Disabled)', 'sudo': True, 'only_if': 'test -f /selinux/enforce'}
2018-07-19 16:31:34,149 - Skipping Execute[('setenforce', '0')] due to not_if
2018-07-19 16:31:34,772 - HdfsResource['/ats/done'] {'security_enabled': False, 'hadoop_bin_dir': '/usr/hdp/current/hadoop-client/bin', 'keytab': [EMPTY], 'dfs_type': '', 'default_fs': 'hdfs://Microscopic', 'hdfs_resource_ignore_file': '/var/lib/ambari-agent/data/.hdfs_resource_ignore', 'hdfs_site': ..., 'kinit_path_local': 'kinit', 'principal_name': [EMPTY], 'user': 'hdfs', 'change_permissions_for_parents': True, 'owner': 'yarn', 'group': 'hadoop', 'hadoop_conf_dir': '/usr/hdp/current/hadoop-client/conf', 'type': 'directory', 'action': ['create_on_execute'], 'immutable_paths': [u'/hdfs/nm/app-logs', u'/apps/hive/warehouse', u'/mr-history/done', u'/tmp'], 'mode': 0755}
... View more
08-03-2018
02:54 PM
@Vinicius Higa Murakami the ids (myid) are 1, 2, and 3 respectively.
... View more
08-02-2018
01:05 PM
@Vinicius Higa Murakami , it seems they are corelated as 1, 2 and 3: server.1=mpp1.xxxxxx:2888:3888 server.2=mpp2.xxxxxx:2888:3888 server.3=mpp3.xxxxxxx:2888:3888
... View more
08-01-2018
03:34 PM
zk-first-last-1000-log-out.tar.gz@Vinicius Higa Murakami ambari-agent log mostly was complaining on node managers which is legit as they are down. I put all the services into a maintenance mode in hopes to suppress that 'legit' noise. I did not see anything suspicious yet and specifically for zookeeper either. As for zookeeper.log and zookeeper.out in the failing nodes like I said earlier it has stopped writing new entries during zookeeper startup. The last entries observed in those files were of July 19th. Just in case I'm attaching first and last 1000 lines for log and out files from one of the failing nodes. But all in all not sure if that information may be relevant for now. Thanks, Alex
... View more
07-31-2018
07:29 PM
Thanks @Vinicius Higa Murakami, the ZOO values are set in zookeeper-env.sh the same way as yours just with real paths and looks the same under the ambari config properties. On one of the ZK failing node I have: Knox, Hmaster, Journal Node, Data Node, Node Manager, HDFS client and the stuff that I was forced to install by Ambari (Activity explorer, HST Server, Metrics Collector, Grafana, HST agent, Metrics Monitor) On another ZK failing node: Journal Node, Zookeeper Failover Controller, Standby Name Node, Standby Resource Manager, Job History Server, Hive Server2, Hive Metastore, Data Node, Node Manager, Ambari Server, PostgreSQL, Spark/Spark2, Spark/Spark2 History server, Phoenix Query Server, dependencies (Metrics Monitor, HST Agent, App Timeline Server); clients: Hbase, HDFS, Hive, MR2, Pig, Slider, Spark, Tez, yarn, zookeeper All the services are currently brought down for troubleshooting purposes. Though zookerper should startup clean first. Thanks, Alex
... View more
07-31-2018
01:25 PM
Hi @Vinicius Higa Murakami , I found log4j in 2 places. Since I'm relatively new to HDP can you confirm which one is is being read during the startup? The log4j.properties which is under /etc/zookeeper/2.6.2.0-205/0/ did reflect the changes. The one which is under /usr/hdp/2.6.2.0-205/etc/zookeeper/conf.dist/log4j.properties did not and the properties names look a bit different (with '.' in the names) As for zookeeper-env.sh (the one that is under /etc/zookeeper/2.6.2.0-205/0/) has my current directory for the log listed which is also in line with Ambari "Advanced zookeeper-env" field . So assuming the config directory is /etc/zookeeper/2.6.2.0-205/0/, the good news is that it picks up changes from Ambari, however it does not look like they are being read. Also I noticed that it adds new log entries only to one host (the one that does not fail after startup), the other 2 hosts that stays after startup for 30 secs or so and then fail do not have any new log entries even if I start them separately. Thanks, Alex
... View more
07-31-2018
03:42 AM
@Vinicius Higa MurakamiThanks for helping me with that. Your file looks absolutely the same as mine - line to line. And no other config groups configured. I was hoping to get into debug mode to see why zookeeper is failing after startup on all but just one node. But not sure at this point if there is any other way to look at the logs and find the reason. Thanks, Alex
... View more
07-26-2018
03:18 PM
I'm heaving issues with zookeeper being stopped a few seconds after startup. My problem though is that it seems the changes made to log4j properties through Ambari are not getting propagated. I'm trying to change rootLogger to DEBUG and change the log file name from default to zookeeper.log Here is the "Advanced zookeeper log4j": # DEFAULT: console appender only #log4j.rootLogger=INFO, CONSOLE, ROLLINGFILE # Example with rolling log file log4j.rootLogger=DEBUG, CONSOLE, ROLLINGFILE # Example with rolling log file and tracing #log4j.rootLogger=TRACE, CONSOLE, ROLLINGFILE, TRACEFILE # # Log INFO level and above messages to the console # log4j.appender.CONSOLE=org.apache.log4j.ConsoleAppender log4j.appender.CONSOLE.Threshold=INFO log4j.appender.CONSOLE.layout=org.apache.log4j.PatternLayout log4j.appender.CONSOLE.layout.ConversionPattern=%d{ISO8601} - %-5p [%t:%C{1}@%L] - %m%n # # Add ROLLINGFILE to rootLogger to get log file output # Log DEBUG level and above messages to a log file log4j.appender.ROLLINGFILE=org.apache.log4j.RollingFileAppender log4j.appender.ROLLINGFILE.Threshold=DEBUG log4j.appender.ROLLINGFILE.File={{zk_log_dir}}/zookeeper.log # Max log file size of 10MB log4j.appender.ROLLINGFILE.MaxFileSize={{zookeeper_log_max_backup_size}}MB # uncomment the next line to limit number of backup files #log4j.appender.ROLLINGFILE.MaxBackupIndex={{zookeeper_log_number_of_backup_files}} log4j.appender.ROLLINGFILE.layout=org.apache.log4j.PatternLayout log4j.appender.ROLLINGFILE.layout.ConversionPattern=%d{ISO8601} - %-5p [%t:%C{1}@%L] - %m%n # # Add TRACEFILE to rootLogger to get log file output # Log DEBUG level and above messages to a log file log4j.appender.TRACEFILE=org.apache.log4j.FileAppender log4j.appender.TRACEFILE.Threshold=TRACE log4j.appender.TRACEFILE.File=zookeeper_trace.log log4j.appender.TRACEFILE.layout=org.apache.log4j.PatternLayout ### Notice we are including log4j's NDC here (%x) log4j.appender.TRACEFILE.layout.ConversionPattern=%d{ISO8601} - %-5p [%t:%C{1}@%L][%x] - %m%n But yet after restarting ps -ef | grpe zookeper shows the following:
zookeep+ 18598 1 4 23:49 ? 00:00:00 /usr/jdk64/jdk1.8.0_112/bin/java -Dzookeeper.log.dir=/xxx/zk/logs -Dzookeeper.log.file=zookeeper-zookeeper-server-xxxxxxxxxxx.log -Dzookeeper.root.logger=INFO,ROLLINGFILE ..... Note that neither DEBUG, nor new log file name (zookeeper.log) have not been propagated by ambari. Is there any property missing? Thanks,
Alex
... View more
Labels:
- Labels:
-
Apache Ambari
07-26-2018
04:15 AM
@Sandeep Kumar I'm heaving the same issue with zookeeper being stopped a few seconds after startup. My problem though is that it seems the changes made to log4j properties through Ambari are not getting propagated. Here the "Advanced zookeeper log4j": # DEFAULT: console appender only #log4j.rootLogger=INFO, CONSOLE, ROLLINGFILE # Example with rolling log file log4j.rootLogger=DEBUG, CONSOLE, ROLLINGFILE # Example with rolling log file and tracing #log4j.rootLogger=TRACE, CONSOLE, ROLLINGFILE, TRACEFILE # # Log INFO level and above messages to the console # log4j.appender.CONSOLE=org.apache.log4j.ConsoleAppender log4j.appender.CONSOLE.Threshold=INFO log4j.appender.CONSOLE.layout=org.apache.log4j.PatternLayout log4j.appender.CONSOLE.layout.ConversionPattern=%d{ISO8601} - %-5p [%t:%C{1}@%L] - %m%n # # Add ROLLINGFILE to rootLogger to get log file output # Log DEBUG level and above messages to a log file log4j.appender.ROLLINGFILE=org.apache.log4j.RollingFileAppender log4j.appender.ROLLINGFILE.Threshold=DEBUG log4j.appender.ROLLINGFILE.File={{zk_log_dir}}/zookeeper.log # Max log file size of 10MB log4j.appender.ROLLINGFILE.MaxFileSize={{zookeeper_log_max_backup_size}}MB # uncomment the next line to limit number of backup files #log4j.appender.ROLLINGFILE.MaxBackupIndex={{zookeeper_log_number_of_backup_files}} log4j.appender.ROLLINGFILE.layout=org.apache.log4j.PatternLayout log4j.appender.ROLLINGFILE.layout.ConversionPattern=%d{ISO8601} - %-5p [%t:%C{1}@%L] - %m%n # # Add TRACEFILE to rootLogger to get log file output # Log DEBUG level and above messages to a log file log4j.appender.TRACEFILE=org.apache.log4j.FileAppender log4j.appender.TRACEFILE.Threshold=TRACE log4j.appender.TRACEFILE.File=zookeeper_trace.log log4j.appender.TRACEFILE.layout=org.apache.log4j.PatternLayout ### Notice we are including log4j's NDC here (%x) log4j.appender.TRACEFILE.layout.ConversionPattern=%d{ISO8601} - %-5p [%t:%C{1}@%L][%x] - %m%n But yet ps -ef | grpe zookeper: zookeep+ 18598 1 4 23:49 ? 00:00:00 /usr/jdk64/jdk1.8.0_112/bin/java -Dzookeeper.log.dir=/xxx/zk/logs -Dzookeeper.log.file=zookeeper-zookeeper-server-xxxxxxxxxxx.log -Dzookeeper.root.logger=INFO,ROLLINGFILE ..... Note that neither DEBUG, nor new log file name (zookeeper.log) have not been propagated by ambari Is there any property missing? Thanks, Alex
... View more