About dineshc

dineshc · ‎09-05-2018

If you have started using Hive LLAP, you would have noticed that by default its configured to use log4j2. Default configuration makes use of advanced features from log4j2 like Rolling Over logs based on time interval and size. With time, a lot of old log files would have accumulated and typically you would compress those files manually or add additional jars and change configuration when using log4j1 to achieve the same With log4j2, a simple change in configuration can ensure that every time a log file is rolled over, it gets compressed for optimal use of storage space. Default configuration: To automatically compress the rolled over log files, update the highlighted line to: appender.DRFA.filePattern = ${sys:hive.log.dir}/${sys:hive.log.file}.%d{yyyy-MM-dd}-%i.gz -%i will ensure that in a rare scenario when there has been increased logging and the threshold size can be been reached more than once in the specified interval, the previously rolled over file won't get over written. .gz will ensure that files are compressed using gzip To understand the finer details about log4j2 appenders, you may check out the official documentation. Similarly you can also make similar changes to llap-cli log settings:

gnovak · ‎05-15-2018

@Dinesh Chitlangia Unfortunately the native build on OS X is broken by HDFS-13403 at this moment on trunk. You have two options: If you don't need native build, you can build hadoop without the -Pnative option successfully. The build issue is fixed by HDFS-13534, but it's not merged yet (at the time of writing this answer). You can either wait until it gets merged, or apply it manually: wget https://issues.apache.org/jira/secure/attachment/12922534/HDFS-13534.001.patch git apply HDFS-13534.001.patch

dineshc · ‎01-16-2018

@Rajesh K There is no harm in starting up both the services and turning off maintenance mode. Regarding your atlas service crashing everytime after startup, it could indicate multiple problems. The most common one could be an out of memory error. Could you check the logs and share the error stack trace ?

dineshc · ‎12-01-2017

When running a custom Java application that connects via JDBC to Hive, after migration to HDP-2.6.x, the application now fails to start with a NoClassDefFoundError or ClassNotFoundException related to a Hive class, like: Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hive/service/cli/thrift/TCLIService$Iface at org.apache.hive.jdbc.HiveDriver.connect(HiveDriver.java:105) at java.sql.DriverManager.getConnection(DriverManager.java:664) at java.sql.DriverManager.getConnection(DriverManager.java:270) Root Cause Prior to HDP-2.6.x, the hive-jdbc.jar is a symlink which points to the "standalone" jdbc jar (the one intended to be used for non-hadoop apps, like a generic app that has JDBC driver DB accessibility), for example in HDP 2.5.0: /usr/hdp/current/hive-client/lib/hive-jdbc.jar -> hive-jdbc-1.2.1000.2.5.0.0-1245-standalone.jar But from newer versions, HDP-2.6.x onwards, the hive-jdbc.jar now points to the "hadoop env" JDBC driver, which has dependencies on many other Hadoop JARs, for example in HDP 2.6.2: /usr/hdp/current/hive-client/lib/hive-jdbc.jar -> hive-jdbc-1.2.1000.2.6.2.0-205.jar or in HDP-2.6.3 /usr/hdp/current/hive-client/lib/hive-jdbc.jar -> hive-jdbc-1.2.1000.2.6.3.0-235.jar Does this mean the HDP stack no longer includes a standalone JAR ? No. The standalone jar has been moved to this path: /usr/hdp/current/hive-client/jdbc Two ways to solve this: 1. Change the custom Java application's classpath to use the hive-jdbc-*-standalone.jar explicitly As noted above, the standalone jar is now available in a different path. For example in HDP-2.6.2: /usr/hdp/current/hive-client/jdbc/hive-jdbc-1.2.1000.2.6.2.0-205-standalone.jar In HDP-2.6.3 /usr/hdp/current/hive-client/jdbc/hive-jdbc-1.2.1000.2.6.3.0-235-standalone.jar 2. Add the following to the HADOOP_CLASSPATH of the custom Java application if it uses other Hadoop components/JARs /usr/hdp/current/hive-client/lib/hive-metastore-*.jar:/usr/hdp/current/hive-client/lib/hive-common-*.jar:/usr/hdp/current/hive-client/lib/hive-cli-*.jar:/usr/hdp/current/hive-client/lib/hive-exec-*.jar:/usr/hdp/current/hive-client/lib/hive-service.jar:/usr/hdp/current/hive-client/lib/libfb303-*.jar:/usr/hdp/current/hive-client/lib/libthrift-*.jar:/usr/hdp/current/hadoop-client/lib/log4j*.jar:/usr/hdp/current/hadoop-client/lib/slf4j-api-*.jar:/usr/hdp/current/hadoop-client/lib/slf4j-log4j12-*.jar:/usr/hdp/current/hadoop-client/lib/commons-logging-*.jar

dineshc · ‎11-16-2017

Description During HDP Upgrade, Hive Metastore restart step fails with message - "ValueError: time data '2017-05-10 19:08:30' does not match format '%Y-%m-%d %H:%M:%S.%f'" Following is the stack trace: Traceback (most recent call last): File "/var/lib/ambari-agent/cache/common-services/HIVE/0.12.0.2.0/package/scripts/hive_metastore.py", line 211, in <module> HiveMetastore().execute() File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", line 329, in execute method(env) File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", line 841, in restart self.pre_upgrade_restart(env, upgrade_type=upgrade_type) File "/var/lib/ambari-agent/cache/common-services/HIVE/0.12.0.2.0/package/scripts/hive_metastore.py", line 118, in pre_upgrade_restart self.upgrade_schema(env) File "/var/lib/ambari-agent/cache/common-services/HIVE/0.12.0.2.0/package/scripts/hive_metastore.py", line 150, in upgrade_schema status_params.tmp_dir) File "/usr/lib/python2.6/site-packages/resource_management/libraries/functions/security_commons.py", line 242, in cached_kinit_executor if (now - datetime.strptime(last_run_time, "%Y-%m-%d %H:%M:%S.%f") > timedelta(minutes=expiration_time)): File "/usr/lib64/python2.6/_strptime.py", line 325, in _strptime (data_string, format)) ValueError: time data '2017-05-10 19:08:30' does not match format '%Y-%m-%d %H:%M:%S.%f' Root cause During the upgrade, the data will be read from a file, such as *_tmp.txt, under the /var/lib/ambari-agent/tmp/kinit_executor_cache directory. This issue occurs if this file is not updated and points to an older date. Solution 1. Login to Hive Metastore host 2. Move *_tmp.txt files mv /var/lib/ambari-agent/tmp/kinit_executor_cache/*_tmp.txt /tmp 3. Retry Restart Hive Metastore step from Ambari Upgrade screen

dineshc · ‎11-13-2017

You need to do that step. That is the one which configures the proxy for your ambari principal.

dineshc · ‎09-24-2018

@kkanchu Thanks for pointing out! Updated the article.

rdevprasad1 · ‎11-20-2017

This was great help.

dineshc · ‎10-27-2017

HDFS per-user Metrics aren't emitted by default. Kindly exercise caution before enabling them and make sure to refer to the details of client and service port numbers. To be able to use the HDFS - Users dashboard in your Grafana instance as well as to view metrics for HDFS per user, you will need to add these custom properties to your configuration. Step-by-step guide Presumption for this guide: This is a HA environment with dfs.internal.nameservices=nnha and dfs.ha.namenodes.nnha=nn1,nn2 in Ambari, HDFS > Configs > Advanced > Custom hdfs-site 1. In Ambari, HDFS > Configs > Advanced > Custom hdfs-site - Add the following properties. dfs.namenode.servicerpc-address.<dfs.internal.nameservices>.nn1=<namenodehost1>:8050 dfs.namenode.servicerpc-address.<dfs.internal.nameservices>.nn2=<namenodehost2>:8050 ipc.8020.callqueue.impl=org.apache.hadoop.ipc.FairCallQueue ipc.8020.backoff.enable=true ipc.8020.scheduler.impl=org.apache.hadoop.ipc.DecayRpcScheduler ipc.8020.scheduler.priority.levels=3 ipc.8020.decay-scheduler.backoff.responsetime.enable=true ipc.8020.decay-scheduler.backoff.responsetime.thresholds=10,20,30 If you have already enabled Service RPC port, then you can avoid adding the first two lines about servicerpc-address. Replace 8020 with your Namenode RPC port if it is different. DO NOT replace it with Service RPC Port or DataNode Lifeline Port 2. After this change you may see issues like both namenodes as Active or both as Standby in Ambari. To avoid this issue: a. Stop the ZKFC on both NameNodes b. Run the following command from one of the Namenode host as hdfs user su - hdfs hdfs zkfc -formatZK c. Restart all ZKFC 3: Restart HDFS & you should see the metrics being emitted. 4: After a few minutes, you should also be able to use the HDFS - Users Dashboard in Grafana. Things to ensure: Client port : 8020 (if different, replace it with appropriate port in all keys) Service port: 8021 (if different, replace it with appropriate port in first value) namenodehost1 and namenodehost2: needs to be replaced with actual values from the cluster and must be FQDN. dfs.internal.nameservices: needs to be replaced with acutal vallues from the cluster Example: dfs.namenode.servicerpc-address.nnha.nn1=<namenodehost1>:8050 dfs.namenode.servicerpc-address.nnha.nn2=<namenodehost2>:8050 * For more than 2 namenodes in your HA environment, please add one additional line for each additional namenode: dfs.namenode.servicerpc-address.<dfs.internal.nameservices>.nnX=<namenodehostX>:8021 Adapted from this wiki which describes how to enable per user HDFS metrics for a non-HA environment. Note : This article has been validated against Ambari-2.5.2 and HDP-2.6.2 It will not work in older versions of Ambari due to this BUG https://issues.apache.org/jira/browse/AMBARI-21640

dougspadotto_h · ‎01-18-2018

Hi Dinesh. These are default values for recent versions of Hive (0.13.0 and later). Sources: https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-hive.server2.async.exec.threads and http://atlas.apache.org/Bridge-Hive.html

Online	Offline
Last Visited	‎12-08-2021 02:51 PM

Member Since	‎10-04-2016 05:35 PM
Last Visited	‎12-08-2021 02:51 PM
Posts	243
Kudos received	276

Cloudera Community

Re: Hortonworks HDPCA Practice Exam V3 Task.

Re: Spark 1.6 - Dataframe read json throws org.apa...

Re: Service 'webhcat' check failed: RA080 Can't de...

Re: Unable to see HDFS metrics in Grafana

Re: Spark sort by key with descending order

Automatically compress Hive LLAP logs

Re: Building Hadoop on MacOS : An Ant BuildExcepti...

Re: Hortonworks HDPCA Practice Exam V3 Task.

Custom Java Applications throw ClassNotFoundExcept...

HDP Upgrade: Hive Metastore restart fails with "Va...

Re: Service 'webhcat' check failed: RA080 Can't de...

Re: How to increase timeout value for Namenode res...

Re: How to force Ambari to avoid HBase Snapshot St...

How To : In HA environment, enable HDFS per user m...

Re: How to : Correctly configuring Apache Hive Hoo...