Member since
09-28-2015
60
Posts
35
Kudos Received
10
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
528 | 06-12-2018 08:36 PM | |
527 | 12-10-2017 07:17 PM | |
3220 | 10-27-2017 06:36 AM | |
1222 | 10-25-2017 06:39 PM | |
590 | 10-02-2017 11:54 PM |
06-12-2018
08:36 PM
@Dhiraj Yes. That should be fine.
... View more
12-10-2017
07:17 PM
1 Kudo
@Gaurav Parmar If you are asking about the numbers : 1324256400 (Monday, December 19, 2011 1:00:00 AM) and 1324303200 (GMT: Monday, December 19, 2011 2:00:00 PM), they are the epoch timestamp. I am not sure about your use case on how/when are you going to supply the timestamp. But, this is one reference to convert human readable dates and time to timestamps and vice versa. https://www.epochconverter.com/
... View more
11-07-2017
12:38 AM
Thanks. Glad to know that it helped.
... View more
11-07-2017
12:33 AM
4 Kudos
@vrathod This will give top level view of stacks available. http://<host_ip>:8080/api/v1/stacks/ For HDP stack versions: http://<host_ip>:8080/api/v1/stacks/HDP Hope this helps.
... View more
10-27-2017
06:36 AM
1 Kudo
@Saravanan Ramaraj I assume the question is around the YARN total memory. This is because Ambari uses the smallest capacity node to bring with the calculations, as Ambari expects a homogenous cluster. But in this case, we have heterogenous cluster as : 1 master 4 CPU,16 GB RAM + 1 data node 8 CPU,30 GB RAM - Thus, Ambari picks the 16 GB one and assumes 2nd one to be of same size and does the calculation for YARN's Node Manager (NM) memory. I assume that both nodes have Node Manager running. - I believe that you would have 11 GB as value for YARN/yarn.nodemanager.resource.memory-mb. Thus, we have 22 GB (11 * 2) available in this case which is > 16 GB. 16*2 = 32 GB, but Ambari takes out memory required to run other processes outside the YARN workspace (eg: RM, HBase etc). Thus we have memory less than 32 GB available (which is expected). Its a good idea to have homogeneous clusters. =================================================================== However, you can make use of Config Groups here in Ambari based on different hardware profiles. You can creates 2 Config Groups (CG) where each CG has one node. By default, there would be a default CG as seen on YARN configs page having both the nodes. How to create a CG is exemplified using HBase here : https://docs.hortonworks.com/HDPDocuments/Ambari-2.1.2.0/bk_Ambari_Users_Guide/content/_using_host_config_groups.html I did the following testing in order to reduce the memory for one node. You can similarly bump up up the memory for the 30 GB node. - Starting with 2 node cluster, where Ambari had given 12 GB to each NM, with total capacity being 24 GB. - Created a CG named 'New' and added 2nd node to it. Then changed the YARN/yarn.nodemanager.resource.memory-mb for 2nd node under 'New' from ~ 12 GB to ~8 GB. - State of Node 1 under 'default' CG: - Restarted "Affected components" as prompted by Ambari after the above changes. - The Total memory changes from 24 GB to 20 GB now. Hope this helps.
... View more
10-26-2017
06:55 PM
@vishwa Following information may help in figuring out why 4 as max. is recommended. - HIVE page > tez-interactive-site/tez.am.resource.memory.mb - HIVE page > hive.tez.container.size - YARN Page > Number of Node Managers - YARN page > yarn.nodemanager.resource.memory-mb - YARN page > yarn.scheduler.minimum-allocation-mb - YARN page > yarn.nodemanager.resource.cpu-vcores - Queue percent for the queue used for LLAP. Screenshot of Hove Server Interactive (HSI) page (where from we enable HSI) to know about the current set values.
... View more
10-26-2017
05:21 PM
@forest lin Can you provide the contents of the corresponding "stdout" as well for the hive-application.txt (which has stderr) taken from Task Log from Ambari ?
... View more
10-25-2017
09:07 PM
@Florin Miron May be, snippet of the failure log lines pasted in comments.
... View more
10-25-2017
08:19 PM
@Florin Miron It looks like all services are going down after their start (which seems weird). You can look at the individual services logs (.log files) from terminal and see what is the stack trace for failure / SHUTDOWN message. By default they are at /var/log/ Eg: Namenode logs : /var/log/hadoop/hdfs/hadoop-hdfs-namenode-<hostname>.log You can attach the logs here.
... View more
10-25-2017
07:31 PM
I belive you need to figure why multiple Spark apps are running. If this is not a production cluster, and no one is going to get affected out of restarting SPARK, you can look into that option. But this just makes me to believe that the configuration settings for SPARK on how many SPARK apps are supposed to run is most probably the difference between two of your clusters. I am not an expert in SPARK to point you to the correct config to look for.
... View more
10-25-2017
06:39 PM
@uri ben-ari You can check it from YARN Resource Manager UI (RM UI). From Ambari YARN page, open RM UI From RM UI, you can have a look at the application which are running under YARN. From there you can look into memory consumption by each application, and compare your clusters for discrepancy. RM UI showing list of apps (with Allocated Memory). You can click on a specific app to have a detailed look on queue and memory used.
... View more
10-24-2017
07:55 PM
9 Kudos
If the cluster has only one queue at root level named 'default' and is consuming 100% of the capacity, Ambari will create a queue named 'llap' when HSI is enabled for the 1st time, which is set to (depends on which value is smaller) either : - the minimum required %age for LLAP to work, or - at 20% of cluster's capacity. -------------------------------------------------------------------------------------------------------------------------------------------------------------------- If this is not the case, where there are more than one queue in cluster, user will have to create/set the queue capacity %age in order to be used for LLAP app. Starting with minimum required for queue capacity (shown below), one can increase the queue %age size in order to add up the LLAP nodes in the cluster, as queue size is one of the primary drivers of how many Node Managers nodes will be running LLAP. Reference code for calculating minimum queue size. Following calculations can be a good reference in order to calculate the minimum queue capacity %age to be set by using the following config values as referenced from Ambari UI : - Total Node Manager nodes in Ambari cluster (NMCount). Can be got from Ambari's YARN page. - YARN Node Manager Size (YarnNMSize) (yarn-site/yarn.nodemanager.resource.memory-mb) - YARN minimum container size (YarnMinContSize) (yarn-site/yarn.scheduler.minimum-allocation-mb) - Slider AM container size (SliderAmSize) (hive-interactive-env/slider_am_container_mb). It is calculated as shown here. - Hive Tez Container Size (HiveTezContSize) (hive-interactive-site/hive.tez.container.size) - Tez AM container size (TezAmContSize) (tez-interactive-site/tez.am.resource.memory.mb) NormalizeUp() function is used to normalize the 1st parameter w.r.t. 2nd parameter (YarnMinContSize). Code reference is here, where the snippet function can be used for calculating by putting in a python file and called with correct params, or doing a manual calculation. Min. Total capacity required for queue to run LLAP (MinCapForLlapQueue) =
NormalizeUp(SliderAmSize, YarnMinContSize) +
NormalizeUp(HiveTezContSize, YarnMinContSize) +
NormalizeUp(TezAmContSize, YarnMinContSize)
Total Cluster Capacity (ClusterCap) = NMCount * YarnNMSize
Min. Queue Percentage Required for queue used for LLAP (in %) (MinQueuePerc) = MinCapForLlapQueue * 100 / ClusterCap
Thus, 'MinQueuePerc' value can used to set the queue size to be used for LLAP app. The queue %age can be changed from Ambari > Views > YARN Queue Manager.
... View more
Labels:
10-24-2017
07:36 PM
@Sudheer Velagapudi - You have provided information that the cluster in reference has "15 nodes with 10 datanodes". For LLAP, as it is a YARN app, the number of Node Managers (NM) is of importance and the size/memory of each NM matters. - Further, queue percentages would be helpful only if you have provided the NM sizes, as it all boils down to the memory that LLAP will end up getting being part of the selected queue (which indirectly tells up the NM's getting used). - Recommended is typically based on the performance and the load expected on the LLAP nodes. However, based on the current information at hand, I can provide pointers on how much is the minimum typically set from Ambari perspective, if Ambari ends up creating the queue named ('llap'). Ambari creates 'llap' named queue to be used by LLAP app only is there is only one queue at root level named 'default' and is consuming 100% of the capacity. Else, user has to create/select the queue to be used for LLAP, as in you case here. --------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Starting with minimum required for queue capacity (shown below), one can increase the queue size in order to add up the LLAP nodes in the cluster, as queue size is one of the primary drivers of how many Node Managers nodes will be running LLAP. Reference code for calculating minimum queue size. We typically set it based on following calculations. The calculations for queue used for LLAP used the following config values as referenced from Ambari UI : - Total Node Manager nodes in Ambari cluster (NMCount). Can be got from Ambari's YARN page. - YARN Node Manager Size (YarnNMSize) (yarn-site/yarn.nodemanager.resource.memory-mb) - YARN minimum container size (YarnMinContSize) (yarn-site/yarn.scheduler.minimum-allocation-mb) - Slider AM container size (SliderAmSize) (hive-interactive-env/slider_am_container_mb). It is calculated as shown here. - Hive Tez Container Size (HiveTezContSize) (hive-interactive-site/hive.tez.container.size) - Tez AM container size (TezAmContSize) (tez-interactive-site/tez.am.resource.memory.mb) NormalizeUp() function is used to normalize the 1st parameter w.r.t. 2nd parameter (YarnMinContSize). Code reference is here, where the snippet function can be used for calculating by putting in a python file and called with correct params, or doing a manual calculation. Min. Total capacity required for queue to run LLAP (MinCapForLlapQueue) = NormalizeUp(SliderAmSize, YarnMinContSize) +
NormalizeUp(HiveTezContSize, YarnMinContSize) +
NormalizeUp(TezAmContSize, YarnMinContSize)
Total Cluster Capacity (ClusterCap) = NMCount * YarnNMSize
Min. Queue Percentage Required for queue used for LLAP (in %) (MinQueuePerc) = MinCapForLlapQueue * 100 / ClusterCap
Thus, 'MinQueuePerc' value can used to set the queue size to be used for LLAP app. Hope this helps.
... View more
10-13-2017
06:33 PM
@azhar shaikh Timeout is coming from the finite timeout that Ambari has put for Service Check python scripts in order to bail out, rather than running forever. The point to note here is that there may be a problem in terms of HBase health in general, which is either making HBase service check to not finish within 300 secs (performance) or HBase process is not responding at all. Can you check the logs for HBase and the the services it depends on to verify their workable state ? CC @Chinmay Das
... View more
10-02-2017
11:54 PM
1 Kudo
@Johnny Fugers They are part of Hortonworks Data Platform (HDP) and is 100% open source under Apache license. In order to have support for these products for your enterprise, you can start from this link to explore the pricing for support and professional services. https://hortonworks.com/services/support/enterprise/ Phone contact : 1.408.675.0983
... View more
09-11-2017
11:20 PM
The variance for this alert is 2,240,642,366B which is 25% of the 8,925,205,907B average (1,785,041,181B is the limit) Given that its coming as CRITICAL alert, this doesn't match with 50% growth rate mentioned in alert as per the 1st screenshot. This doesn't look correct, as its assuming CRITICAL threshold to be less than 25%. Can you enable and disable the alerts once, if that refreshes ?
... View more
09-11-2017
10:20 PM
@Sam Red Thanks for the screenshot. Need more clarity. Given that you are saying that you are getting the error : No Under Replicated Blocks No failed Disk Volumes. Are you getting this error in alert ? A working alerts (HEALTH/CRITICAL/WARN) looks like this (sample) Refer instance and response section. screen-shot-2017-09-11-at-31904-pm.png Can I have the full screenshot having instance and response section ?
... View more
09-11-2017
09:11 PM
Can you click on the alerts and put the screenshot here for it for details ? Can you see if this discussion helps you ? https://community.hortonworks.com/questions/65555/hdfs-storage-capacity-usage-alert.html
... View more
09-08-2017
07:03 PM
3 Kudos
New tags can only be created by those users who have 50 or more reputation points. If you need one and you don’t have enough points send an email to @Mark Herring . Few guidelines:
No more spaces in tags Tags can only have 25 characters Original answer: This is the closest i could get.
Doesnt talk about the >= points, but tells about the guidelines.
https://community.hortonworks.com/page/tagging.html
May be we need to go via submitting Tag request, if deemed necessary.
... View more
08-17-2017
07:05 PM
PROBLEM: After upgrade from IOP (v 4.2.5) to HDP (v 2.6.x) in a kerberized setup, HSI start fails with following in HSI start log : WARN impl.LlapZookeeperRegistryImpl: The cluster is not started yet (InvalidACL); will retry
ERROR impl.LlapZookeeperRegistryImpl: Unable to start curator PathChildrenCache
org.apache.zookeeper.KeeperException$InvalidACLException: KeeperErrorCode = InvalidACL for /llap-sasl/user-hive
at org.apache.zookeeper.KeeperException.create(KeeperException.java:121) ~[zookeeper-3.4.6.2.6.2.0-173.jar:3.4.6-173--1]
at org.apache.zookeeper.KeeperException.create(KeeperException.java:51) ~[zookeeper-3.4.6.2.6.2.0-173.jar:3.4.6-173--1]
at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:783) ~[zookeeper-3.4.6.2.6.2.0-173.jar:3.4.6-173--1]
at org.apache.curator.utils.ZKPaths.mkdirs(ZKPaths.java:232) ~[curator-client-2.7.1.jar:?]
at org.apache.curator.utils.EnsurePath$InitialHelper$1.call(EnsurePath.java:148) ~[curator-client-2.7.1.jar:?]
at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:107) ~[curator-client-2.7.1.jar:?]
at org.apache.curator.utils.EnsurePath$InitialHelper.ensure(EnsurePath.java:141) ~[curator-client-2.7.1.jar:?]
at org.apache.curator.utils.EnsurePath.ensure(EnsurePath.java:99) ~[curator-client-2.7.1.jar:?]
at org.apache.curator.framework.recipes.cache.PathChildrenCache.rebuild(PathChildrenCache.java:323) ~[curator-recipes-2.7.1.jar:?]
at org.apache.curator.framework.recipes.cache.PathChildrenCache.start(PathChildrenCache.java:300) ~[curator-recipes-2.7.1.jar:?]
at org.apache.hadoop.hive.llap.registry.impl.LlapZookeeperRegistryImpl.checkPathChildrenCache(LlapZookeeperRegistryImpl.java:827) [hive-exec-2.1.0.2.6.2.0-173.jar:2.1.0.2.6.2.0-173]
at org.apache.hadoop.hive.llap.registry.impl.LlapZookeeperRegistryImpl.getInstances(LlapZookeeperRegistryImpl.java:790) [hive-exec-2.1.0.2.6.2.0-173.jar:2.1.0.2.6.2.0-173]
at org.apache.hadoop.hive.llap.registry.impl.LlapRegistryService.getInstances(LlapRegistryService.java:139) [hive-exec-2.1.0.2.6.2.0-173.jar:2.1.0.2.6.2.0-173]
at org.apache.hadoop.hive.llap.cli.LlapStatusServiceDriver.populateAppStatusFromLlapRegistry(LlapStatusServiceDriver.java:579) [hive-llap-server-2.1.0.2.6.2.0-173.jar:2.1.0.2.6.2.0-173]
at org.apache.hadoop.hive.llap.cli.LlapStatusServiceDriver.run(LlapStatusServiceDriver.java:285) [hive-llap-server-2.1.0.2.6.2.0-173.jar:2.1.0.2.6.2.0-173]
at org.apache.hadoop.hive.llap.cli.LlapStatusServiceDriver.main(LlapStatusServiceDriver.java:914) [hive-llap-server-2.1.0.2.6.2.0-173.jar:2.1.0.2.6.2.0-173]
ERROR cli.LlapStatusServiceDriver: FAILED: Failed to get instances from llap registry
org.apache.hadoop.hive.llap.cli.LlapStatusServiceDriver$LlapStatusCliException: Failed to get instances from llap registry
at org.apache.hadoop.hive.llap.cli.LlapStatusServiceDriver.populateAppStatusFromLlapRegistry(LlapStatusServiceDriver.java:581) [hive-llap-server-2.1.0.2.6.2.0-173.jar:2.1.0.2.6.2.0-173]
at org.apache.hadoop.hive.llap.cli.LlapStatusServiceDriver.run(LlapStatusServiceDriver.java:285) [hive-llap-server-2.1.0.2.6.2.0-173.jar:2.1.0.2.6.2.0-173]
at org.apache.hadoop.hive.llap.cli.LlapStatusServiceDriver.main(LlapStatusServiceDriver.java:914) [hive-llap-server-2.1.0.2.6.2.0-173.jar:2.1.0.2.6.2.0-173]
Caused by: java.io.IOException: org.apache.zookeeper.KeeperException$InvalidACLException: KeeperErrorCode = InvalidACL for /llap-sasl/user-hive
at org.apache.hadoop.hive.llap.registry.impl.LlapZookeeperRegistryImpl.checkPathChildrenCache(LlapZookeeperRegistryImpl.java:836) ~[hive-exec-2.1.0.2.6.2.0-173.jar:2.1.0.2.6.2.0-173]
at org.apache.hadoop.hive.llap.registry.impl.LlapZookeeperRegistryImpl.getInstances(LlapZookeeperRegistryImpl.java:790) ~[hive-exec-2.1.0.2.6.2.0-173.jar:2.1.0.2.6.2.0-173]
at org.apache.hadoop.hive.llap.registry.impl.LlapRegistryService.getInstances(LlapRegistryService.java:139) ~[hive-exec-2.1.0.2.6.2.0-173.jar:2.1.0.2.6.2.0-173]
at org.apache.hadoop.hive.llap.cli.LlapStatusServiceDriver.populateAppStatusFromLlapRegistry(LlapStatusServiceDriver.java:579) ~[hive-llap-server-2.1.0.2.6.2.0-173.jar:2.1.0.2.6.2.0-173]
... 2 more
Caused by: org.apache.zookeeper.KeeperException$InvalidACLException: KeeperErrorCode = InvalidACL for /llap-sasl/user-hive
at org.apache.zookeeper.KeeperException.create(KeeperException.java:121) ~[zookeeper-3.4.6.2.6.2.0-173.jar:3.4.6-173--1]
at org.apache.zookeeper.KeeperException.create(KeeperException.java:51) ~[zookeeper-3.4.6.2.6.2.0-173.jar:3.4.6-173--1]
at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:783) ~[zookeeper-3.4.6.2.6.2.0-173.jar:3.4.6-173--1]
at org.apache.curator.utils.ZKPaths.mkdirs(ZKPaths.java:232) ~[curator-client-2.7.1.jar:?]
at org.apache.curator.utils.EnsurePath$InitialHelper$1.call(EnsurePath.java:148) ~[curator-client-2.7.1.jar:?]
at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:107) ~[curator-client-2.7.1.jar:?]
at org.apache.curator.utils.EnsurePath$InitialHelper.ensure(EnsurePath.java:141) ~[curator-client-2.7.1.jar:?]
at org.apache.curator.utils.EnsurePath.ensure(EnsurePath.java:99) ~[curator-client-2.7.1.jar:?]
at org.apache.curator.framework.recipes.cache.PathChildrenCache.rebuild(PathChildrenCache.java:323) ~[curator-recipes-2.7.1.jar:?]
at org.apache.curator.framework.recipes.cache.PathChildrenCache.start(PathChildrenCache.java:300) ~[curator-recipes-2.7.1.jar:?]
at org.apache.hadoop.hive.llap.registry.impl.LlapZookeeperRegistryImpl.checkPathChildrenCache(LlapZookeeperRegistryImpl.java:827) ~[hive-exec-2.1.0.2.6.2.0-173.jar:2.1.0.2.6.2.0-173]
at org.apache.hadoop.hive.llap.registry.impl.LlapZookeeperRegistryImpl.getInstances(LlapZookeeperRegistryImpl.java:790) ~[hive-exec-2.1.0.2.6.2.0-173.jar:2.1.0.2.6.2.0-173]
at org.apache.hadoop.hive.llap.registry.impl.LlapRegistryService.getInstances(LlapRegistryService.java:139) ~[hive-exec-2.1.0.2.6.2.0-173.jar:2.1.0.2.6.2.0-173]
at org.apache.hadoop.hive.llap.cli.LlapStatusServiceDriver.populateAppStatusFromLlapRegistry(LlapStatusServiceDriver.java:579) ~[hive-llap-server-2.1.0.2.6.2.0-173.jar:2.1.0.2.6.2.0-173]
... 2 more
FAILED: Failed to get instances from llap registry
INFO LlapStatusServiceDriverConsole: LLAP status unknown
INFO LlapStatusServiceDriverConsole: --------------------------------------------------------------------------------
WARN cli.LlapStatusServiceDriver: Watch mode enabled and got LLAP registry error. Retrying..
WARN impl.LlapZookeeperRegistryImpl: The cluster is not started yet (InvalidACL); will retry Correspondingly, from LLAP YARN application log, we see the following: Caused by: java.io.IOException: Login failure for hive/bug-86157-7.openstacklocal@EXAMPLE.COM from keytab /etc/security/keytabs/hive.llap.zk.sm.keytab: javax.security.auth.login.LoginException: Unable to obtain password from user
at org.apache.hadoop.security.UserGroupInformation.loginUserFromKeytabAndReturnUGI(UserGroupInformation.java:1351) ~[hadoop-common-2.7.3.2.6.2.0-178.jar:?]
at org.apache.hadoop.hive.llap.LlapUtil.loginWithKerberos(LlapUtil.java:78) ~[hive-exec-2.1.0.2.6.2.0-178.jar:2.1.0.2.6.2.0-178]
at org.apache.hadoop.hive.llap.security.SecretManager.createLlapZkConf(SecretManager.java:202) ~[hive-exec-2.1.0.2.6.2.0-178.jar:2.1.0.2.6.2.0-178]
... 4 more
Caused by: javax.security.auth.login.LoginException: Unable to obtain password from user
at com.sun.security.auth.module.Krb5LoginModule.promptForPass(Krb5LoginModule.java:897) ~[?:1.8.0_141]
at com.sun.security.auth.module.Krb5LoginModule.attemptAuthentication(Krb5LoginModule.java:760) ~[?:1.8.0_141]
at com.sun.security.auth.module.Krb5LoginModule.login(Krb5LoginModule.java:617) ~[?:1.8.0_141]
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_141]
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_141]
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_141]
at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_141]
at javax.security.auth.login.LoginContext.invoke(LoginContext.java:755) ~[?:1.8.0_141]
at javax.security.auth.login.LoginContext.access$000(LoginContext.java:195) ~[?:1.8.0_141]
at javax.security.auth.login.LoginContext$4.run(LoginContext.java:682) ~[?:1.8.0_141]
at javax.security.auth.login.LoginContext$4.run(LoginContext.java:680) ~[?:1.8.0_141]
at java.security.AccessController.doPrivileged(Native Method) ~[?:1.8.0_141]
at javax.security.auth.login.LoginContext.invokePriv(LoginContext.java:680) ~[?:1.8.0_141]
at javax.security.auth.login.LoginContext.login(LoginContext.java:587) ~[?:1.8.0_141]
at org.apache.hadoop.security.UserGroupInformation.loginUserFromKeytabAndReturnUGI(UserGroupInformation.java:1340) ~[hadoop-common-2.7.3.2.6.2.0-178.jar:?]
at org.apache.hadoop.hive.llap.LlapUtil.loginWithKerberos(LlapUtil.java:78) ~[hive-exec-2.1.0.2.6.2.0-178.jar:2.1.0.2.6.2.0-178]
at org.apache.hadoop.hive.llap.security.SecretManager.createLlapZkConf(SecretManager.java:202) ~[hive-exec-2.1.0.2.6.2.0-178.jar:2.1.0.2.6.2.0-178] REASON : After IOP-HDP upgrade, the keytab file permissions for "hive.llap.zk.sm.keytab" is only owner readable. # ls -al /etc/security/keytabs/hive.llap.zk.sm.keytab
-r--------. 1 yarn hadoop 428 Aug 15 19:34 hive.llap.zk.sm.keytab
Thus, hive user is not able to access hive.llap.zk.sm.keytab. The group not having read permission is because IOP kerberos.json file for YARN have permissions removed for it. FIX: Regenerating keytabs via Ambari fixes the permission issue, as hive user can now access the keytabs being part of hadoop group. # ls -al /etc/security/keytabs/hive.llap.zk.sm.keytab
-r--r-----. 1 yarn hadoop 428 Aug 15 20:26 hive.llap.zk.sm.keytab - Once regeneration is done, start HSI again. P.S. : Hive Server Interactive is made up of 2 sub-components : LLAP and Hive2/HiveServer2.
... View more
Labels:
08-17-2017
06:51 PM
1 Kudo
PROBLEM: After upgrade from IOP (v 4.2.0) to HDP (v 2.6.x) in a kerberized setup, HSI install fails with Configuration parameter 'hive.llap.daemon.service.principal' missing in dictionary. Traceback (most recent call last):
File "/var/lib/ambari-agent/cache/common-services/HIVE/0.12.0.2.0/package/scripts/hive_server_interactive.py", line 616, in <module>
HiveServerInteractive().execute()
File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", line 329, in execute
method(env)
File "/var/lib/ambari-agent/cache/common-services/HIVE/0.12.0.2.0/package/scripts/hive_server_interactive.py", line 75, in install
import params
File "/var/lib/ambari-agent/cache/common-services/HIVE/0.12.0.2.0/package/scripts/params.py", line 27, in <module>
from params_linux import *
File "/var/lib/ambari-agent/cache/common-services/HIVE/0.12.0.2.0/package/scripts/params_linux.py", line 697, in <module>
hive_llap_principal = (config['configurations']['hive-interactive-site']['hive.llap.daemon.service.principal']).replace('_HOST',hostname.lower())
File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/config_dictionary.py", line 73, in __getattr__
raise Fail("Configuration parameter '" + self.name + "' was not found in configurations dictionary!")
resource_management.core.exceptions.Fail: Configuration parameter 'hive.llap.daemon.service.principal' was not found in configurations dictionary! Reason: HSI related keytabs are missing on cluster. Fix: Regenerate keytabs via Ambari. - Once regeneration is done, you can reinstall HSI and start it. P.S. : Hive Server Interactive is made up of 2 sub-components : LLAP and Hive2/HiveServer2. Thanks to @vrathod for reporting it.
... View more
- Find more articles tagged with:
- Ambari
- Cloud & Operations
- hsi
- iop
- Issue Resolution
- keytabs
- llap
Labels:
08-17-2017
06:51 PM
PROBLEM: After upgrade from IOP (v 4.2.0) to HDP (v 2.6.x) in a kerberized setup, HSI install fails with Configuration parameter 'hive.llap.daemon.service.principal' missing in dictionary. Traceback (most recent call last):
File "/var/lib/ambari-agent/cache/common-services/HIVE/0.12.0.2.0/package/scripts/hive_server_interactive.py", line 616, in <module>
HiveServerInteractive().execute()
File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", line 329, in execute
method(env)
File "/var/lib/ambari-agent/cache/common-services/HIVE/0.12.0.2.0/package/scripts/hive_server_interactive.py", line 75, in install
import params
File "/var/lib/ambari-agent/cache/common-services/HIVE/0.12.0.2.0/package/scripts/params.py", line 27, in <module>
from params_linux import *
File "/var/lib/ambari-agent/cache/common-services/HIVE/0.12.0.2.0/package/scripts/params_linux.py", line 697, in <module>
hive_llap_principal = (config['configurations']['hive-interactive-site']['hive.llap.daemon.service.principal']).replace('_HOST',hostname.lower())
File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/config_dictionary.py", line 73, in __getattr__
raise Fail("Configuration parameter '" + self.name + "' was not found in configurations dictionary!")
resource_management.core.exceptions.Fail: Configuration parameter 'hive.llap.daemon.service.principal' was not found in configurations dictionary! Reason: HSI related keytabs are missing on cluster. Fix: Regenerate keytabs via Ambari. - Once regeneration is done, you can reinstall HSI and start it. P.S. : Hive Server Interactive is made up of 2 sub-components : LLAP and Hive2/HiveServer2.
... View more
Labels:
05-24-2017
06:30 PM
@Daniel Allardice With the information you posted, its not easy to gauge what you know and what you dont 🙂
... View more
05-22-2017
05:50 PM
You can click on Alert Definition name (such as Metrics Collector Process, Metrics Collector - Auto-restart Status) to get to know about the issue. The next page that opens up after clicking will have Response area when you scroll down. Clicking on Response area will provide you complete information.
... View more
04-20-2017
08:28 PM
1 Kudo
You can configure the YARN's capacity scheduler from Ambari from "YARN Queue Manager" Click on that. You will get the list of queue that are currently available and from there you can update/remove/add the queues.
... View more
04-20-2017
07:57 PM
@Artem Ervits Just to add. Today one of my answer (comment) got lost, and I had to retype. I am sure I did submit it and saw it getting posted and later seeing it not there. Once i typed and posted again, I see 2 of my answers then (had to delete one, as both had the same information). I would have waited if it said under moderation, but it didn't say so. So, there is some lag/ staging happening in the way information shows up after posting.
... View more
04-20-2017
05:41 PM
Thanks @Artem Ervits
... View more
04-19-2017
11:43 PM
Screenshot: Any ideas on this ?
... View more
04-18-2017
01:50 AM
1 Kudo
Issue: HSI's (Tech Preview) component LLAP start fails in kerberized setup because of missing keytabs.
When HSI is started, its component LLAP fails with below trace: INFO impl.LlapRegistryService: Using LLAP registry (client) type: Service LlapRegistryService in state LlapRegistryService: STARTED
INFO state.ConnectionStateManager: State change: CONNECTED
ERROR impl.LlapZookeeperRegistryImpl: Unable to start curator PathChildrenCache. Exception: {}
org.apache.zookeeper.KeeperException$InvalidACLException: KeeperErrorCode = InvalidACL for /llap-sasl/user-hive
at org.apache.zookeeper.KeeperException.create(KeeperException.java:121) ~[zookeeper-3.4.6.2.5.3.0-37.jar:3.4.6-37--1]
at org.apache.zookeeper.KeeperException.create(KeeperException.java:51) ~[zookeeper-3.4.6.2.5.3.0-37.jar:3.4.6-37--1]
at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:783) ~[zookeeper-3.4.6.2.5.3.0-37.jar:3.4.6-37--1]
at org.apache.curator.utils.ZKPaths.mkdirs(ZKPaths.java:232) ~[curator-client-2.7.1.jar:?]
at org.apache.curator.utils.EnsurePath$InitialHelper$1.call(EnsurePath.java:148) ~[curator-client-2.7.1.jar:?]
at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:107) ~[curator-client-2.7.1.jar:?]
at org.apache.curator.utils.EnsurePath$InitialHelper.ensure(EnsurePath.java:141) ~[curator-client-
2.7.1.jar:?]
at org.apache.curator.utils.EnsurePath.ensure(EnsurePath.java:99) ~[curator-client-2.7.1.jar:?]
at org.apache.curator.framework.recipes.cache.PathChildrenCache.rebuild(PathChildrenCache.java:323) ~[curator-recipes-2.7.1.jar:?]
at org.apache.curator.framework.recipes.cache.PathChildrenCache.start(PathChildrenCache.java:300) ~[curator-recipes-2.7.1.jar:?]
at org.apache.hadoop.hive.llap.registry.impl.LlapZookeeperRegistryImpl.checkPathChildrenCache(LlapZookeeperRegistryImpl.java:757) [hive-exec-2.1.0.2.5.3.0-37.jar:2.1.0.2.5.3.0-37]
at org.apache.hadoop.hive.llap.registry.impl.LlapZookeeperRegistryImpl.getInstances(LlapZookeeperRegistryImpl.java:725) [hive-exec-2.1.0.2.5.3.0-37.jar:2.1.0.2.5.3.0-37]
at org.apache.hadoop.hive.llap.registry.impl.LlapRegistryService.getInstances(LlapRegistryService.java:129) [hive-exec-2.1.0.2.5.3.0-37.jar:2.1.0.2.5.3.0-37]
at org.apache.hadoop.hive.llap.cli.LlapStatusServiceDriver.populateAppStatusFromLlapRegistry(LlapStatusServiceDriver.java:490) [hive-llap-server-2.1.0.2.5.3.0-37.jar:2.1.0.2.5.3.0-37]
at org.apache.hadoop.hive.llap.cli.LlapStatusServiceDriver.run(LlapStatusServiceDriver.java:245) [hive-llap-server-2.1.0.2.5.3.0-37.jar:2.1.0.2.5.3.0-37]
at org.apache.hadoop.hive.llap.cli.LlapStatusServiceDriver.main(LlapStatusServiceDriver.java:941) [hive-llap-server-2.1.0.2.5.3.0-37.jar:2.1.0.2.5.3.0-37]
ERROR cli.LlapStatusServiceDriver: FAILED: Failed to get instances from llap registry
org.apache.hadoop.hive.llap.cli.LlapStatusServiceDriver$LlapStatusCliException: Failed to get instances from llap registry
at org.apache.hadoop.hive.llap.cli.LlapStatusServiceDriver.populateAppStatusFromLlapRegistry(LlapStatusServiceDriver.java:492) [hive-llap-server-2.1.0.2.5.3.0-37.jar:2.1.0.2.5.3.0-37]
at org.apache.hadoop.hive.llap.cli.LlapStatusServiceDriver.run(LlapStatusServiceDriver.java:245) [hive-llap-server-2.1.0.2.5.3.0-37.jar:2.1.0.2.5.3.0-37]
at org.apache.hadoop.hive.llap.cli.LlapStatusServiceDriver.main(LlapStatusServiceDriver.java:941) [hive-llap-server-2.1.0.2.5.3.0-37.jar:2.1.0.2.5.3.0-37]
Caused by: java.io.IOException: org.apache.zookeeper.KeeperException$InvalidACLException: KeeperErrorCode = InvalidACL for /llap-sasl/user-hive
at org.apache.hadoop.hive.llap.registry.impl.LlapZookeeperRegistryImpl.checkPathChildrenCache(LlapZookeeperRegistryImpl.java:760) ~[hive-exec-2.1.0.2.5.3.0-37.jar:2.1.0.2.5.3.0-37]
at org.apache.hadoop.hive.llap.registry.impl.LlapZookeeperRegistryImpl.getInstances(LlapZookeeperRegistryImpl.java:725) ~[hive-exec-2.1.0.2.5.3.0-37.jar:2.1.0.2.5.3.0-37]
at org.apache.hadoop.hive.llap.registry.impl.LlapRegistryService.getInstances(LlapRegistryService.java:129) ~[hive-exec-2.1.0.2.5.3.0-37.jar:2.1.0.2.5.3.0-37]
at org.apache.hadoop.hive.llap.cli.LlapStatusServiceDriver.populateAppStatusFromLlapRegistry(LlapStatusServiceDriver.java:490) ~[hive-llap-server-2.1.0.2.5.3.0-37.jar:2.1.0.2.5.3.0-37]
... 2 more This can happen in case the HSI is enabled after kerberizing the cluster. Reason: - This is because HSI needs 2 ketab files : 'hive.service.keytab' and 'hive.llap.zk.sm.keytab' present on all the YARN's NodeManager nodes. - If HSI is not enabled before the cluster's kerberization, the above two keytab files will not get distributed on all the NodeManager nodes, unlike when HSI is Enabled before kerberization. Thus, the error: Caused by: org.apache.zookeeper.KeeperException$InvalidACLException: KeeperErrorCode = InvalidACL for /llap-sasl/user-hive at org.apache.zookeeper.KeeperException.create(KeeperException.java:121) ~[zookeeper-3.4.6.2.5.0.0-1245.jar:3.4.6-1245--1] because the ZK node is not created / missing. zk: localhost:2181(CONNECTED) 3] ls /llap-sasl []
zk node is missing
Resolution: - Regenerating keytabs from Ambari Kerberos page, will distribute the above keytab files on all NodeManager Nodes. - Further, do confirm that Hive's config hive.llap.zk.sm.connectionString is updated with the list of all Zookeeper Nodes in the cluster. For example: zk.host1.org:2181,zk.host2.org:2181,zk.host3.org:2181 The Zookeeper Nodes list ca be got from here: Note to append the Port Numbers as mentioned in example. Restart HSI to confirm the behavior.
... View more
- Find more articles tagged with:
- Ambari
- Data Processing
- FAQ
- Hive
- hive2
- Issue Resolution
- issue-resolution
- llap
Labels:
04-04-2017
06:28 PM
@Christophe Vico Did you check the Oozie server logs for this issue ? Further "Oozie admin -Oozie <URL> -status " error is on the oozie server Alert? Can you post the complete alert Stack trace ?
... View more