Member since
10-04-2016
243
Posts
281
Kudos Received
43
Solutions
09-24-2018
08:44 PM
@kkanchu Thanks for pointing out! Updated the article.
... View more
09-05-2018
09:05 PM
2 Kudos
If you have started using Hive LLAP, you would have noticed that by default its configured to use log4j2. Default configuration makes use of advanced features from log4j2 like Rolling Over logs based on time interval and size. With time, a lot of old log files would have accumulated and typically you would compress those files manually or add additional jars and change configuration when using log4j1 to achieve the same With log4j2, a simple change in configuration can ensure that every time a log file is rolled over, it gets compressed for optimal use of storage space. Default configuration: To automatically compress the rolled over log files, update the highlighted line to: appender.DRFA.filePattern = ${sys:hive.log.dir}/${sys:hive.log.file}.%d{yyyy-MM-dd}-%i.gz -%i will ensure that in a rare scenario when there has been increased logging and the threshold size can be been reached more than once in the specified interval, the previously rolled over file won't get over written. .gz will ensure that files are compressed using gzip To understand the finer details about log4j2 appenders, you may check out the official documentation. Similarly you can also make similar changes to llap-cli log settings:
... View more
Labels:
12-01-2017
06:06 PM
2 Kudos
When running a custom Java application that connects via JDBC to Hive, after migration to HDP-2.6.x, the application now fails to start with a NoClassDefFoundError or ClassNotFoundException related to a Hive class, like: Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hive/service/cli/thrift/TCLIService$Iface
at org.apache.hive.jdbc.HiveDriver.connect(HiveDriver.java:105)
at java.sql.DriverManager.getConnection(DriverManager.java:664)
at java.sql.DriverManager.getConnection(DriverManager.java:270)
Root Cause Prior to HDP-2.6.x, the hive-jdbc.jar is a symlink which points to the "standalone" jdbc jar (the one intended to be used for non-hadoop apps, like a generic app that has JDBC driver DB accessibility), for example in HDP 2.5.0: /usr/hdp/current/hive-client/lib/hive-jdbc.jar -> hive-jdbc-1.2.1000.2.5.0.0-1245-standalone.jar But from newer versions, HDP-2.6.x onwards, the hive-jdbc.jar now points to the "hadoop env" JDBC driver, which has dependencies on many other Hadoop JARs, for example in HDP 2.6.2: /usr/hdp/current/hive-client/lib/hive-jdbc.jar -> hive-jdbc-1.2.1000.2.6.2.0-205.jar or in HDP-2.6.3 /usr/hdp/current/hive-client/lib/hive-jdbc.jar -> hive-jdbc-1.2.1000.2.6.3.0-235.jar Does this mean the HDP stack no longer includes a standalone JAR ? No. The standalone jar has been moved to this path: /usr/hdp/current/hive-client/jdbc Two ways to solve this: 1. Change the custom Java application's classpath to use the hive-jdbc-*-standalone.jar explicitly As noted above, the standalone jar is now available in a different path. For example in HDP-2.6.2: /usr/hdp/current/hive-client/jdbc/hive-jdbc-1.2.1000.2.6.2.0-205-standalone.jar
In HDP-2.6.3 /usr/hdp/current/hive-client/jdbc/hive-jdbc-1.2.1000.2.6.3.0-235-standalone.jar 2. Add the following to the HADOOP_CLASSPATH of the custom Java application if it uses other Hadoop components/JARs /usr/hdp/current/hive-client/lib/hive-metastore-*.jar:/usr/hdp/current/hive-client/lib/hive-common-*.jar:/usr/hdp/current/hive-client/lib/hive-cli-*.jar:/usr/hdp/current/hive-client/lib/hive-exec-*.jar:/usr/hdp/current/hive-client/lib/hive-service.jar:/usr/hdp/current/hive-client/lib/libfb303-*.jar:/usr/hdp/current/hive-client/lib/libthrift-*.jar:/usr/hdp/current/hadoop-client/lib/log4j*.jar:/usr/hdp/current/hadoop-client/lib/slf4j-api-*.jar:/usr/hdp/current/hadoop-client/lib/slf4j-log4j12-*.jar:/usr/hdp/current/hadoop-client/lib/commons-logging-*.jar
... View more
11-16-2017
03:58 PM
2 Kudos
Description During HDP Upgrade, Hive Metastore restart step fails with message - "ValueError: time data '2017-05-10 19:08:30' does not match format '%Y-%m-%d %H:%M:%S.%f'" Following is the stack trace: Traceback (most recent call last):
File "/var/lib/ambari-agent/cache/common-services/HIVE/0.12.0.2.0/package/scripts/hive_metastore.py", line 211, in <module> HiveMetastore().execute()
File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", line 329, in execute method(env)
File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", line 841, in restart self.pre_upgrade_restart(env, upgrade_type=upgrade_type)
File "/var/lib/ambari-agent/cache/common-services/HIVE/0.12.0.2.0/package/scripts/hive_metastore.py", line 118, in pre_upgrade_restart self.upgrade_schema(env)
File "/var/lib/ambari-agent/cache/common-services/HIVE/0.12.0.2.0/package/scripts/hive_metastore.py", line 150, in upgrade_schema status_params.tmp_dir)
File "/usr/lib/python2.6/site-packages/resource_management/libraries/functions/security_commons.py", line 242, in cached_kinit_executor if (now - datetime.strptime(last_run_time, "%Y-%m-%d %H:%M:%S.%f") > timedelta(minutes=expiration_time)):
File "/usr/lib64/python2.6/_strptime.py", line 325, in _strptime (data_string, format))
ValueError: time data '2017-05-10 19:08:30' does not match format '%Y-%m-%d %H:%M:%S.%f' Root cause During the upgrade, the data will be read from a file, such as *_tmp.txt, under the /var/lib/ambari-agent/tmp/kinit_executor_cache directory. This issue occurs if this file is not updated and points to an older date. Solution 1. Login to Hive Metastore host 2. Move *_tmp.txt files mv /var/lib/ambari-agent/tmp/kinit_executor_cache/*_tmp.txt /tmp
3. Retry Restart Hive Metastore step from Ambari Upgrade screen
... View more
Labels:
11-09-2017
03:23 AM
2 Kudos
During upgrade, if Namenode restarts timeout, it may not appear to be a problem as the request times out from the Ambari UI but the restart process continue to run in the background. However, this can lead to inconsistencies in Ambari database and cause further issues at Finalize upgrade step. Note: This article is only useful upto Ambari-2.5.x version and must be performed before starting the upgrade process. Ambari-2.6 onwards it is a one step change where you only need to update the ambari.properties file instead of all the xml changes listed below. For Ambari-2.6 onwards: refer this document https://docs.hortonworks.com/HDPDocuments/Ambari-2.6.0.0/bk_ambari-upgrade/content/preparing_to_upgrade_ambari_and_hdp.html Caution: The steps described below are a hack! It is not recommended that we go about making changes to upgrade XML files based on our needs. Please exercise caution and measure your risks before following the steps. You can increase the timeouts for namenode restart using the following steps before you start the upgrade process: Step 1: Locate the upgrade file on Ambari server host. If we are upgrading from HDP-2.5 to HDP-2.6 then : /var/lib/ambari-server/resources/stacks/HDP/2.5/upgrades/nonrolling-upgrade-2.6.xml [Express Upgrade]
/var/lib/ambari-server/resources/stacks/HDP/2.5/upgrades/upgrade-2.6.xml [Rolling Upgrade] Step 2: Change <service name="HDFS">
<component name="NAMENODE">
<upgrade>
<task xsi:type="restart-task"/>
</upgrade>
</component>
to <service name="HDFS">
<component name="NAMENODE">
<upgrade>
<task xsi:type="restart-task" timeout-config="upgrade.parameter.nn-restart.timeout"/>
</upgrade>
</component>
Step 3: Add this to ambari.properties upgrade.parameter.nn-restart.timeout=XXXXXX
where XXXXXX is the time in seconds Step 4: Restart Ambari Server Step 5: Now you can move on to your upgrade process
... View more
Labels:
11-09-2017
01:10 AM
2 Kudos
For this article, I am using Ambari-2.5.2.0 and trying to upgrade HDP-2.5.3 to HDP-2.6.2. If you have a large HBase cluster, it can take a long time to do a HBase snapshot. As part of upgrade, this is one of the steps which Ambari will perform for you. However, for a large cluster, this step can actually lead to a timeout from Ambari UI and may result in further inconsistencies just before the Finalize Upgrade step. To overcome this, a lot of people have started performing a manual HBase Snapshot before the upgrade. However, not many have found a way to force Ambari to skip this step and save some time instead of waiting for it to timeout in order to proceed to next step during the upgrade. Here is how you can skip HBase Snapshot step altogether (in case you want to perform it manually before the upgrade): Caution: The steps described below are a hack! It is not recommended that we go about making changes to upgrade XML files based on our needs. Please exercise caution and measure your risks before following the steps. The following steps must be performed before starting the upgrade. Step 1: Locate the upgrade XML file on ambari-server host /var/lib/ambari-server/resources/stacks/HDP/2.5/upgrades/upgrade-2.6.xml (for Rolling Upgrade) /var/lib/ambari-server/resources/stacks/HDP/2.5/upgrades/nonrolling-upgrade-2.6.xml (for Express Upgrade) Step 2: Comment out the following piece of code in the upgrade XML file and save it <execute-stage service="HBASE" component="HBASE_MASTER" title="Snapshot HBASE">
<task xsi:type="execute" hosts="master">
<script>scripts/hbase_upgrade.py</script>
<function>take_snapshot</function>
</task>
</execute-stage> Step 3: Restart Ambari Server for it to pick up the changes. Step 4: Now you can start your upgrade.
... View more
Labels:
10-27-2017
04:00 AM
5 Kudos
HDFS per-user Metrics aren't emitted by default. Kindly exercise caution before enabling them and make sure to refer to the details of client and service port numbers. To be able to use the HDFS - Users dashboard in your Grafana instance as well as to view metrics for HDFS per user, you will need to add these custom properties to your configuration. Step-by-step guide Presumption for this guide: This is a HA environment with dfs.internal.nameservices=nnha and dfs.ha.namenodes.nnha=nn1,nn2 in Ambari, HDFS > Configs > Advanced > Custom hdfs-site 1. In Ambari, HDFS > Configs > Advanced > Custom hdfs-site - Add the following properties. dfs.namenode.servicerpc-address.<dfs.internal.nameservices>.nn1=<namenodehost1>:8050
dfs.namenode.servicerpc-address.<dfs.internal.nameservices>.nn2=<namenodehost2>:8050
ipc.8020.callqueue.impl=org.apache.hadoop.ipc.FairCallQueue
ipc.8020.backoff.enable=true
ipc.8020.scheduler.impl=org.apache.hadoop.ipc.DecayRpcScheduler
ipc.8020.scheduler.priority.levels=3
ipc.8020.decay-scheduler.backoff.responsetime.enable=true
ipc.8020.decay-scheduler.backoff.responsetime.thresholds=10,20,30 If you have already enabled Service RPC port, then you can avoid adding the first two lines about servicerpc-address. Replace 8020 with your Namenode RPC port if it is different. DO NOT replace it with Service RPC Port or DataNode Lifeline Port 2. After this change you may see issues like both namenodes as Active or both as Standby in Ambari. To avoid this issue: a. Stop the ZKFC on both NameNodes b. Run the following command from one of the Namenode host as hdfs user su - hdfs
hdfs zkfc -formatZK
c. Restart all ZKFC 3: Restart HDFS & you should see the metrics being emitted. 4: After a few minutes, you should also be able to use the HDFS - Users Dashboard in Grafana. Things to ensure:
Client port : 8020 (if different, replace it with appropriate port in all keys) Service port: 8021 (if different, replace it with appropriate port in first value) namenodehost1 and namenodehost2: needs to be replaced with actual values from the cluster and must be FQDN. dfs.internal.nameservices: needs to be replaced with acutal vallues from the cluster Example: dfs.namenode.servicerpc-address.nnha.nn1=<namenodehost1>:8050 dfs.namenode.servicerpc-address.nnha.nn2=<namenodehost2>:8050 * For more than 2 namenodes in your HA environment, please add one additional line for each additional namenode: dfs.namenode.servicerpc-address.<dfs.internal.nameservices>.nnX=<namenodehostX>:8021 Adapted from this wiki which describes how to enable per user HDFS metrics for a non-HA environment. Note : This article has been validated against Ambari-2.5.2 and HDP-2.6.2 It will not work in older versions of Ambari due to this BUG https://issues.apache.org/jira/browse/AMBARI-21640
... View more
Labels:
10-18-2017
11:54 PM
3 Kudos
This article is an extension to the official HDP document. Apart from following the steps listed in this document, you must perform the following checks to ensure the hook is configured correctly and does not result in errors when you start executing queries in Hive. 1. In hive-site.xml, verify hive.server2.async.exec.threads is not set to 1. If so, then increase to 100. 2. In hive-site.xml, verify the max thread pool size is not set to 1. Increase it to 5 to begin with and you may need to increase it further depending on the load. Recommended values:
<property>
<name>atlas.hook.hive.maxThreads</name>
<value>5</value>
</property>
<property>
<name>hive.server2.async.exec.threads</name>
<value>100</value>
</property>
... View more
Labels:
10-18-2017
11:39 PM
2 Kudos
Scenario: The cluster is using both Hive and Atlas components. Sometimes a simple query like 'show databases' fails with the error stack shown below: beeline> show databases; Getting log thread is interrupted, since query is done! Error: Error while processing statement: FAILED: Hive Internal Error: java.util.concurrent.RejectedExecutionException(Task java.util.concurrent.FutureTask@e871c01 rejected from java.util.concurrent.ThreadPoolExecutor@5b868755[Running, pool size = 1, active threads = 1, queued tasks = 10000, completed tasks = 14807]) (state=08S01,code=12) java.sql.SQLException: Error while processing statement: FAILED: Hive Internal Error: java.util.concurrent.RejectedExecutionException(Task java.util.concurrent.FutureTask@e871c01 rejected from java.util.concurrent.ThreadPoolExecutor@5b868755[Running, pool size = 1, active threads = 1, queued tasks = 10000, completed tasks = 14807]) at org.apache.hive.jdbc.HiveStatement.execute(HiveStatement.java:282) at org.apache.hive.beeline.Commands.execute(Commands.java:848) at org.apache.hive.beeline.Commands.sql(Commands.java:713) at org.apache.hive.beeline.BeeLine.dispatch(BeeLine.java:983) at org.apache.hive.beeline.BeeLine.execute(BeeLine.java:823) at org.apache.hive.beeline.BeeLine.begin(BeeLine.java:781) at org.apache.hive.beeline.BeeLine.mainWithInputRedirection(BeeLine.java:485) at org.apache.hive.beeline.BeeLine.main(BeeLine.java:468) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.util.RunJar.run(RunJar.java:221) at org.apache.hadoop.util.RunJar.main(RunJar.java:136) HiveServer2 Log: 2017-10-10 14:00:38,985 INFO [HiveServer2-Background-Pool: Thread-273112]: log.PerfLogger (PerfLogger.java:PerfLogBegin(135)) - <PERFLOG method=PostHook.org.apache.atlas.hive.hook.HiveHook from=org.apache.hadoop.hive.ql.Driver> 2017-10-10 14:00:38,986 ERROR [HiveServer2-Background-Pool: Thread-273112]: ql.Driver (SessionState.java:printError(962)) - FAILED: Hive Internal Error: java.util.concurrent.RejectedExecutionException(Task java.util.concurrent.FutureTask@3f389d45 rejected from java.util.concurrent.ThreadPoolExecutor@5b868755[Running, pool size = 1, active threads = 1, queued tasks = 10000, completed tasks = 14807]) java.util.concurrent.RejectedExecutionException: Task java.util.concurrent.FutureTask@3f389d45 rejected from java.util.concurrent.ThreadPoolExecutor@5b868755[Running, pool size = 1, active threads = 1, queued tasks = 10000, completed tasks = 14807] at java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2047) at java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:823) at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1369) at java.util.concurrent.AbstractExecutorService.submit(AbstractExecutorService.java:112) at org.apache.atlas.hive.hook.HiveHook.run(HiveHook.java:174) Root Cause Often users are led to believe that this issue can be fixed by removing 'org.apache.atlas.hive.hook.HiveHook' from hive.exec.post.hooks property. hive.exec.post.hooks=org.apache.hadoop.hive.ql.hooks.ATSHook, org.apache.atlas.hive.hook.HiveHook
However, when you are using both Atlas and Hive, then 'org.apache.atlas.hive.hook.HiveHook' should not be removed. Instead, the error clearly indicates that this issue is due to improper ThreadPool configuration. In this case the max thread pool size is 1 and the waiting queue size is 10000. Solution 1. In hive-site.xml, verify the value for property "hive.server2.async.exec.threads". If set to 1, increase to 100. 2. Increase max thread pool related values with respect to Atlas threads in hive-site.xml, example <property>
<name>atlas.hook.hive.maxThreads</name>
<value>5</value>
</property>
<property>
<name>atlas.hook.hive.minThreads</name>
<value>1</value>
</property>
... View more
Labels: