Member since
01-11-2018
33
Posts
1
Kudos Received
1
Solution
My Accepted Solutions
Title | Views | Posted |
---|---|---|
3207 | 04-13-2018 08:38 AM |
10-23-2018
10:11 AM
@csgunaNo, actually we restart HiveServer in controlled conditions when it reaches 25k+ open file descriptors. Now we plan to raise the number of allowed file descriptors for HiveServer.
... View more
10-21-2018
09:12 AM
@Tomas79Thank you very much for your reply. We do run spark apps on our cluster but in our case we don't see large number of open operations - both 'Open connections' and 'Open operations' are on relatively low level, below 30. I also noticed that probably I recognized the problem as 'leak' prematurely, as when cluster utilization is low, the number of open descriptors is also much lower, what wouldn't happen if connections hanged in an incorrect state. Now I'm coming to conclusion that this situation is a consequence of high Hive utilization and many small files around HDFS. Thank you for your time, Cheers
... View more
10-18-2018
07:24 AM
Hi! On our CDH 5.9.3 cluster we experience problem with growing number of HiveServer2, what forces us to restart instances of the HiveServer every 2 weeks or so. In our case most of the open file descriptors are actually TCP connections to lots Data Nodes on port 1004. Has anybody else had similar problem before? Is there any fix available for this issue? Can it be related to any of these two bugs: https://issues.apache.org/jira/browse/HIVE-1185 https://issues.apache.org/jira/browse/HIVE-7081 Thanks for any help.
... View more
Labels:
- Labels:
-
Apache Hive
07-17-2018
03:13 AM
Hi @bgooley, It's working, thank you. Just wanted to add that one may add this configuration to Hue Server Advanced Configuration Snippet (Safety Valve) for hue_safety_valve_server.ini instead and make this configuration piece different for all instances of the Hue servers. Thanks!
... View more
07-13-2018
04:59 AM
Hi All, I'd like to ask if it's possible to configure Hue in Cloudera Manager to use specific Hiveserver2. As far as I can see there is only option for the "Hive service" as a whole, and CM automagically sets beeswax_server_host in hue.ini. I've tried to override this behaviour by adding section: [ beeswax ] server_interface=hiveserver2 beeswax_server_port=10000 beeswax_server_host=<hiveserver_ip> Hue Service Advanced Configuration Snippet (Safety Valve) and Hue Server Advanced Configuration Snippet (Safety Valve) as proposed in https://groups.google.com/a/cloudera.org/forum/#!topic/hue-user/WzxrpI4ykW8 but with no success so far. Has anyone tried this configuration? Is this doable in CM, or the only way is to build hiveserver2 load balancer? Thank you in advance
... View more
Labels:
- Labels:
-
Apache Hive
-
Cloudera Hue
-
Cloudera Manager
05-08-2018
04:18 AM
Hi @Harsh J, thanks a million for such thorough and elaborate answer. I haven't solved the problem yet, probably I will apply cgroups configuration as suggested. I hope it's going to work however the reason why single JVM uses so much CPU is misterious to me. I understand that yarn treats vcore as rough forecaster of how much CPU time will be used and probably we could mitigate problem by putting more vcores in job's application or reducing number of running containers on the node in other way, but still we wouldn't be guaranteed that some of the containers wouldn't use even more CPU up to total capacity of the server. It looks like having containers running many threads resulting in CPU share more than 100% per container undermines the concept how yarn dispatch tasks to the nodes. I've also come across this tutorial: https://hortonworks.com/blog/apache-hadoop-yarn-in-hdp-2-2-isolation-of-cpu-resources-in-your-hadoop-yarn-clusters/ which includes: " how do we ensure that containers don’t exceed their vcore allocation? What’s stopping an errant container from spawning a bunch of threads and consume all the CPU on the node?" It appears that in the past the rule of thumb 1+vcore/1real core worked ( I saw it several older tutorials) but now we have different patterns of the workload (less IO dependant/more CPU consuming) and this rule doesn't work very well now So effectively cgroups seem to be the only solution to ensure containers don't exceed their vcore allocation. Let me know if you agree or see other solutions. Thanks a million!
... View more
05-04-2018
03:50 AM
Hi! We have some problem with YARN performance that manifest itself in very high utilization of CPU (up to 100%) and growing workloads on all nodes. There are max 16 containers running on each node with 20 CPUs available, which gives 40 available vCores (thanks to hyperthreading). However, the hadoop jobs seem to take more resources than they should - sometimes 500-700% CPU in 'top' command output per one container. YARN configuration seem to be quite reasonable: yarn.scheduler.minimum-allocation-vcores=1, yarn.scheduler.increment-allocation-vcores=1, yarn.scheduler.maximum-allocation-vcores=16 and all offending jobs request only 1 vCore when checked in their review on Resource Manager's UI. However despite that configuration one can see more than 20 threads open for every container, whereas I'd expect no more than 2. Does anyone know why a single container uses so much CPU time? Is it possible to control number of threads per container with some MapRed configuration or java settings? Or maybe this problem comes from flaws in the MapRed program itself? Would cgroups be a good aproach to solve this issue? I'm using CDH 5.9.3 with computation running on Centos 7.2. Thanks for all answers!
... View more
Labels:
- Labels:
-
Apache YARN
-
MapReduce
04-13-2018
08:38 AM
I've found the solution - it's possible to use another parameter to prevent user from using mapred.job.queuename: hive.conf.restricted.list
... View more
04-10-2018
11:32 AM
Hi, One of the prerequisites for enabling Sentry is disabling impersonation, which basically means that all queries are executed by user hive, not the user that actually executed the command in Hue. However, this requirement invalidates yarn queue placement, which so far used user's group name to select proper queue, and now assigns all jobs to hive's group instead. It's possible to avoid this behaviour by setting 'Specified' placement policy on top - this way we are able to ensure that job will be assigned to the queue based on the user's group, not the default group for hive. However, this solution opens possibility for some intentional misuse - it's possible that the user sets the mapred.job.queue.name parameter in Hue's session and circumvents yarn queue placement, potentially using more cluster resources than administrator intended to give them. We've tried to use parameter hive.security.authorization.sqlstd.confwhitelist to prevent users from setting mapred.job.queue.name but apparently it requires enabling Standard Hive Authorization https://cwiki.apache.org/confluence/display/Hive/SQL+Standard+Based+Hive+Authorization#SQLStandardBasedHiveAuthorization-RestrictionsonHiveCommandsandStatements whereas Cloudera doesn't support using native Hive authorization frameworks. Anyway, we failed to configure Standard Hive Authorization and Sentry at the same time in our CDH 5.9.3. Morever, the https://www.cloudera.com/documentation/enterprise/5-9-x/topics/sg_sentry_service_config.html clarifies that we are not even able to use hive.security.command.whitelist to block using 'set' command at all. Taking this all together, we'd like to ask: 1. if using 'Specified' placement policy in Yarn queue configuration is the only way to activate mapping based on group of user logged in Hue, not the hive user? 2. do you know other ways to block setting specific variables in Hive session? 3. is it possible to configure Standard Hive Authorization in CDH at all? I'll be thankful for any clue. Thanks!
... View more
Labels:
- Labels:
-
Apache Hive
-
Apache Sentry
-
Cloudera Hue
04-09-2018
08:51 AM
Hi @Harsh J, thank you for even more thorough answer, placement policy is clear now. I didn't see the risk of rogue Yarn apps before, it's very helpful. Many thanks!
... View more
- « Previous
-
- 1
- 2
- Next »