Member since
10-04-2016
243
Posts
281
Kudos Received
43
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1184 | 01-16-2018 03:38 PM | |
6153 | 11-13-2017 05:45 PM | |
3060 | 11-13-2017 12:30 AM | |
1524 | 10-27-2017 03:58 AM | |
28464 | 10-19-2017 03:17 AM |
09-05-2018
09:05 PM
2 Kudos
If you have started using Hive LLAP, you would have noticed that by default its configured to use log4j2. Default configuration makes use of advanced features from log4j2 like Rolling Over logs based on time interval and size. With time, a lot of old log files would have accumulated and typically you would compress those files manually or add additional jars and change configuration when using log4j1 to achieve the same With log4j2, a simple change in configuration can ensure that every time a log file is rolled over, it gets compressed for optimal use of storage space. Default configuration: To automatically compress the rolled over log files, update the highlighted line to: appender.DRFA.filePattern = ${sys:hive.log.dir}/${sys:hive.log.file}.%d{yyyy-MM-dd}-%i.gz -%i will ensure that in a rare scenario when there has been increased logging and the threshold size can be been reached more than once in the specified interval, the previously rolled over file won't get over written. .gz will ensure that files are compressed using gzip To understand the finer details about log4j2 appenders, you may check out the official documentation. Similarly you can also make similar changes to llap-cli log settings:
... View more
Labels:
05-15-2018
07:49 AM
@Dinesh Chitlangia Unfortunately the native build on OS X is broken by HDFS-13403 at this moment on trunk. You have two options: If you don't need native build, you can build hadoop without the -Pnative option successfully. The build issue is fixed by HDFS-13534, but it's not merged yet (at the time of writing this answer). You can either wait until it gets merged, or apply it manually: wget https://issues.apache.org/jira/secure/attachment/12922534/HDFS-13534.001.patch
git apply HDFS-13534.001.patch
... View more
01-16-2018
03:38 PM
1 Kudo
@Rajesh K There is no harm in starting up both the services and turning off maintenance mode. Regarding your atlas service crashing everytime after startup, it could indicate multiple problems. The most common one could be an out of memory error. Could you check the logs and share the error stack trace ?
... View more
12-01-2017
06:06 PM
2 Kudos
When running a custom Java application that connects via JDBC to Hive, after migration to HDP-2.6.x, the application now fails to start with a NoClassDefFoundError or ClassNotFoundException related to a Hive class, like: Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hive/service/cli/thrift/TCLIService$Iface
at org.apache.hive.jdbc.HiveDriver.connect(HiveDriver.java:105)
at java.sql.DriverManager.getConnection(DriverManager.java:664)
at java.sql.DriverManager.getConnection(DriverManager.java:270)
Root Cause Prior to HDP-2.6.x, the hive-jdbc.jar is a symlink which points to the "standalone" jdbc jar (the one intended to be used for non-hadoop apps, like a generic app that has JDBC driver DB accessibility), for example in HDP 2.5.0: /usr/hdp/current/hive-client/lib/hive-jdbc.jar -> hive-jdbc-1.2.1000.2.5.0.0-1245-standalone.jar But from newer versions, HDP-2.6.x onwards, the hive-jdbc.jar now points to the "hadoop env" JDBC driver, which has dependencies on many other Hadoop JARs, for example in HDP 2.6.2: /usr/hdp/current/hive-client/lib/hive-jdbc.jar -> hive-jdbc-1.2.1000.2.6.2.0-205.jar or in HDP-2.6.3 /usr/hdp/current/hive-client/lib/hive-jdbc.jar -> hive-jdbc-1.2.1000.2.6.3.0-235.jar Does this mean the HDP stack no longer includes a standalone JAR ? No. The standalone jar has been moved to this path: /usr/hdp/current/hive-client/jdbc Two ways to solve this: 1. Change the custom Java application's classpath to use the hive-jdbc-*-standalone.jar explicitly As noted above, the standalone jar is now available in a different path. For example in HDP-2.6.2: /usr/hdp/current/hive-client/jdbc/hive-jdbc-1.2.1000.2.6.2.0-205-standalone.jar
In HDP-2.6.3 /usr/hdp/current/hive-client/jdbc/hive-jdbc-1.2.1000.2.6.3.0-235-standalone.jar 2. Add the following to the HADOOP_CLASSPATH of the custom Java application if it uses other Hadoop components/JARs /usr/hdp/current/hive-client/lib/hive-metastore-*.jar:/usr/hdp/current/hive-client/lib/hive-common-*.jar:/usr/hdp/current/hive-client/lib/hive-cli-*.jar:/usr/hdp/current/hive-client/lib/hive-exec-*.jar:/usr/hdp/current/hive-client/lib/hive-service.jar:/usr/hdp/current/hive-client/lib/libfb303-*.jar:/usr/hdp/current/hive-client/lib/libthrift-*.jar:/usr/hdp/current/hadoop-client/lib/log4j*.jar:/usr/hdp/current/hadoop-client/lib/slf4j-api-*.jar:/usr/hdp/current/hadoop-client/lib/slf4j-log4j12-*.jar:/usr/hdp/current/hadoop-client/lib/commons-logging-*.jar
... View more
11-16-2017
03:58 PM
2 Kudos
Description During HDP Upgrade, Hive Metastore restart step fails with message - "ValueError: time data '2017-05-10 19:08:30' does not match format '%Y-%m-%d %H:%M:%S.%f'" Following is the stack trace: Traceback (most recent call last):
File "/var/lib/ambari-agent/cache/common-services/HIVE/0.12.0.2.0/package/scripts/hive_metastore.py", line 211, in <module> HiveMetastore().execute()
File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", line 329, in execute method(env)
File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", line 841, in restart self.pre_upgrade_restart(env, upgrade_type=upgrade_type)
File "/var/lib/ambari-agent/cache/common-services/HIVE/0.12.0.2.0/package/scripts/hive_metastore.py", line 118, in pre_upgrade_restart self.upgrade_schema(env)
File "/var/lib/ambari-agent/cache/common-services/HIVE/0.12.0.2.0/package/scripts/hive_metastore.py", line 150, in upgrade_schema status_params.tmp_dir)
File "/usr/lib/python2.6/site-packages/resource_management/libraries/functions/security_commons.py", line 242, in cached_kinit_executor if (now - datetime.strptime(last_run_time, "%Y-%m-%d %H:%M:%S.%f") > timedelta(minutes=expiration_time)):
File "/usr/lib64/python2.6/_strptime.py", line 325, in _strptime (data_string, format))
ValueError: time data '2017-05-10 19:08:30' does not match format '%Y-%m-%d %H:%M:%S.%f' Root cause During the upgrade, the data will be read from a file, such as *_tmp.txt, under the /var/lib/ambari-agent/tmp/kinit_executor_cache directory. This issue occurs if this file is not updated and points to an older date. Solution 1. Login to Hive Metastore host 2. Move *_tmp.txt files mv /var/lib/ambari-agent/tmp/kinit_executor_cache/*_tmp.txt /tmp
3. Retry Restart Hive Metastore step from Ambari Upgrade screen
... View more
Labels:
11-13-2017
06:14 AM
1 Kudo
You need to do that step. That is the one which configures the proxy for your ambari principal.
... View more
09-24-2018
08:44 PM
@kkanchu Thanks for pointing out! Updated the article.
... View more
11-20-2017
06:30 AM
This was great help.
... View more
10-27-2017
04:00 AM
5 Kudos
HDFS per-user Metrics aren't emitted by default. Kindly exercise caution before enabling them and make sure to refer to the details of client and service port numbers. To be able to use the HDFS - Users dashboard in your Grafana instance as well as to view metrics for HDFS per user, you will need to add these custom properties to your configuration. Step-by-step guide Presumption for this guide: This is a HA environment with dfs.internal.nameservices=nnha and dfs.ha.namenodes.nnha=nn1,nn2 in Ambari, HDFS > Configs > Advanced > Custom hdfs-site 1. In Ambari, HDFS > Configs > Advanced > Custom hdfs-site - Add the following properties. dfs.namenode.servicerpc-address.<dfs.internal.nameservices>.nn1=<namenodehost1>:8050
dfs.namenode.servicerpc-address.<dfs.internal.nameservices>.nn2=<namenodehost2>:8050
ipc.8020.callqueue.impl=org.apache.hadoop.ipc.FairCallQueue
ipc.8020.backoff.enable=true
ipc.8020.scheduler.impl=org.apache.hadoop.ipc.DecayRpcScheduler
ipc.8020.scheduler.priority.levels=3
ipc.8020.decay-scheduler.backoff.responsetime.enable=true
ipc.8020.decay-scheduler.backoff.responsetime.thresholds=10,20,30 If you have already enabled Service RPC port, then you can avoid adding the first two lines about servicerpc-address. Replace 8020 with your Namenode RPC port if it is different. DO NOT replace it with Service RPC Port or DataNode Lifeline Port 2. After this change you may see issues like both namenodes as Active or both as Standby in Ambari. To avoid this issue: a. Stop the ZKFC on both NameNodes b. Run the following command from one of the Namenode host as hdfs user su - hdfs
hdfs zkfc -formatZK
c. Restart all ZKFC 3: Restart HDFS & you should see the metrics being emitted. 4: After a few minutes, you should also be able to use the HDFS - Users Dashboard in Grafana. Things to ensure:
Client port : 8020 (if different, replace it with appropriate port in all keys) Service port: 8021 (if different, replace it with appropriate port in first value) namenodehost1 and namenodehost2: needs to be replaced with actual values from the cluster and must be FQDN. dfs.internal.nameservices: needs to be replaced with acutal vallues from the cluster Example: dfs.namenode.servicerpc-address.nnha.nn1=<namenodehost1>:8050 dfs.namenode.servicerpc-address.nnha.nn2=<namenodehost2>:8050 * For more than 2 namenodes in your HA environment, please add one additional line for each additional namenode: dfs.namenode.servicerpc-address.<dfs.internal.nameservices>.nnX=<namenodehostX>:8021 Adapted from this wiki which describes how to enable per user HDFS metrics for a non-HA environment. Note : This article has been validated against Ambari-2.5.2 and HDP-2.6.2 It will not work in older versions of Ambari due to this BUG https://issues.apache.org/jira/browse/AMBARI-21640
... View more
Labels:
01-18-2018
03:44 PM
Hi Dinesh. These are default values for recent versions of Hive (0.13.0 and later). Sources: https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-hive.server2.async.exec.threads and http://atlas.apache.org/Bridge-Hive.html
... View more