About amcbarnett

amcbarnett · ‎02-04-2016

As per https://community.hortonworks.com/content/kbentry/550/unofficial-storm-and-kafka-best-practices-guide.html ZK on separate nodes from Kafka Broker. Do Not Install zk nodes on the same node as kafka broker if you want optimal Kafka performance. Disk I/O both kafka and zk are disk I/O intensive

amcbarnett · ‎02-04-2016

Introduction Apache Tez has become a very important framework and API to support batch and interactive over terabytes and petabytes of data for many engines within HDP such as Pig, Hive, Java. Cascading and others, with performance advantages at scale over Map Reduce and even Spark at certain volumes of data. For more on Apache Tez see http://hortonworks.com/hadoop/tez/ This article is meant to outline the best practice in configuring and tuning Tez, and why you would set certain values in certain properties to get performance at scale, with step by step instructions. With this in place you would hopefully prevent out of memory errors when you execute your Hive Queries or Pig Scripts as seen in https://community.hortonworks.com/questions/5780/hive-on-tez-query-map-output-outofmemoryerror-java.html https://community.hortonworks.com/questions/12067/what-is-the-workaround-when-getting-hive-outofmemo.html Tez Memory Demystified I find a diagram usually helps to understand why you would set certain properties. This is a quick summary of the main memory settings for Tez for both the Application Master and Container. Please refer to it as you read below. A list of some of the main Tez properties can be found here: http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.4/bk_installing_manually_book/content/ref-ffec9e6b-41f4-47de-b5cd-1403b4c4a7c8.1.html I also highly recommend reading the Hive Tuning Guide for HDP ----------------------------------------------------------------- Steps to Configure Step 0 - If you are a Hortonworks Support Subscription Customer, begin to utilize the SamrtSense tool. Hortonworks SmartSense is a cluster diagnostic and recommendation tool that is critical for efficient support case resolution, pre-emptive issue detection and performance tuning. Your recommended Tez configurations would be provided to you as a customer. This is the value Hortonworks brings. You can access the white paper here http://hortonworks.com/info/hortonworks-smartsense/ Upload your bundles, apply the recommendations, and you have no need to go any further in this article. But if you must.... ----------------------------------------------------------------- Step 1 - Determine your YARN Node manager Resource Memory (yarn.nodemanager.resource.memory-mb) and your YARN minimum container size (yarn.scheduler.minimum-allocation-mb). Your yarn.scheduler.maximum-allocation-mb is the same as yarn.nodemanager.resource.memory-mb. yarn.nodemanager.resource.memory-mb is the Total memory of RAM allocated for all the nodes of the cluster for YARN. Based on the number of containers, the minimum YARN memory allocation for a container is yarn.scheduler.minimum-allocation-mb. yarn.scheduler.minimum-allocation-mb will be a very important setting for our Tez Application Master and Container sizes. So how do we determine this with just the number of cores, disks, and RAM on each node? The Hortonworks easy button approach. Follow the instructions at this link, Determine HDP Memory Config. For example, if you are on HD Insight running a D12 node with 8 CPUs and 28GBs of memory, with no HBase, you run: Run python yarn-utils.py -c 8 -m 28 -d 2 -k False Your output would look like this. In Ambari, configure the appropriate settings for YARN and MapReduce or in a non-Ambari managed cluster, manually add the first three settings in yarn-site.xml and the rest in mapred-site.xml on all nodes. ----------------------------------------------------------------- Step 2 - Determine your Tez Application Master and Container Size, that is tez.am.resource.memory.mb and hive.tez.container.size. Set tez.am.resource.memory.mb to be the same as yarn.scheduler.minimum-allocation-mb the YARN minimum container size. Set hive.tez.container.size to be the same as or a small multiple (1 or 2 times that) of YARN container size yarn.scheduler.minimum-allocation-mb but NEVER more than yarn.scheduler.maximum-allocation-mb. You want to have headroom for multiple containers to be spun up. A general guidance: Don't exceed Memory per processors as you want one processor per container. So if you have for example, 256GB and 16 cores, you don't want to have your container bigger than 16GB. Bonus: Container Reuse set to True: tez.am.container.reuse.enabled (Default is true) Prewarm Containers when HiveSever2 Starts, under Hive Configurations in Ambari. ----------------------------------------------------------------- Step 3 - Application Master and Container Java Heap sizes (tez.am.launch.cmd-opts and hive.tez.java.ops respectively) By default these are BOTH 80% of the container sizes, tez.am.resource.memory.mb and hive.tez.container.sizerespectfully. NOTE: tez.am.launch.cmd-opts is automatically set, so no need to change this. In HDP 2.3 and above, no need to also set hive.tez.java.ops as it can be automatically set controlled by a new property tez.container.max.java.heap.fraction which is defaulted to 0.8 in tez-site.xml. This property is not by default in Ambari. If you wish you can add it to the Custom tez-site.sml. As you can see from Ambari, in Hive -> Advance configurations, there are no manual memory configurations set for hive.tez.java.opts if you wish to make the heap 75% of the container, then set the Tez Container Java Heap Fraction to 0.75 If you wish this set manually, you can add to hive.tez.java.ops for example -Xmx7500m -Xms 7500m, as longs as it is a fraction of hive.tez.container.size ----------------------------------------------------------------- Step 4: Now to determine Hive Memory Map Join Settings parameters. tez.runtime.io.sort.mb is the memory when the output needs to be sorted. tez.runtime.unordered.output.buffer.size-mb is the memory when the output does not need to be sorted. hive.auto.convert.join.noconditionaltask.size is a very important parameter to size memory to perform Map Joins. You want to perform Map joins as much as possible. In Ambari this is under the Hive Confguration For more on this see http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.4/bk_performance_tuning/content/ch_setting_memory_usage_for_hive_perf.html SET tez.runtime.io.sort.mb to be 40% of hive.tez.container.size. You should rarely have more than2GB set. By default hive.auto.convert.join.noconditionaltask = true SET hive.auto.convert.join.noconditionaltask.size to 1/3 of hive.tez.container.size SET tez.runtime.unordered.output.buffer.size-mb to 10% of hive.tez.container.size ----------------------------------------------------------------- FOR MORE ADVANCED SETTINGS CONCERNING QUERY OPTIMIZATION Step 5 - For for Query optimization and Mapper Parallelism see http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.4/bk_performance_tuning/content/ch_query_optimization_hive.html Step 6 - Determining Number of Mappers The following parameters control the number of mappers for splittable formats with Tez: set tez.grouping.min-size=16777216; -- 16 MB min split set tez.grouping.max-size=1073741824; -- 1 GB max split Increase min and max split size to reduce the number of mappers. See also How Initial task parallelism works https://community.hortonworks.com/questions/905/how-are-number-of-mappers-determined-for-a-query-w.html --------------------------- References For Microsoft Azure and HDInsight https://azure.microsoft.com/en-us/documentation/articles/hdinsight-hadoop-hive-out-of-memory-error-oom/ https://azure.microsoft.com/en-us/documentation/articles/hdinsight-hadoop-optimize-hive-query/

amcbarnett · ‎02-02-2016

Download it again, this time use firefox and not Chrome or IE..

amcbarnett · ‎02-02-2016

Updated HDInsight 3.3 component link is at https://azure.microsoft.com/en-us/documentation/articles/hdinsight-component-versioning/

amcbarnett · ‎02-02-2016

Thank you for your Micorsoft contributions on HCC @Cindy Gross

amcbarnett · ‎02-02-2016

Actually if you are using HDP the jar you would use is htrace-core-3.1.0-incubating.jar found in /usr/hdp/current/hbase-master/lib instead of htrace-core-2.0.0.jar

amcbarnett · ‎02-02-2016

@Kibrom Gebrehiwot This is the Sandbox connection issue right? Did you follow the suggestions I gave at https://community.hortonworks.com/questions/12846/how-to-solve-connection-refused-errors-in-hdp.html#comment-13052 I think you re trying to use the Sandbox in a way it is not intended also. It is not intended to sit on a headless server, and have multiple people access it. It is intended to sit on your desktop, once desktop with a NAT network interface. If you want multiple people to access a cluster, you should build your own. See this experience from someone else https://community.hortonworks.com/questions/8581/alert-on-sandboxhortonworkscom-hdp-232.html#answer-11170

amcbarnett · ‎02-02-2016

@Benson Shih Which instructions did you follow? You would need to do the following Install and configure Solr Cloud: https://cwiki.apache.org/confluence/display/RANGER/How+to+configure+Solr+Cloud+with+Kerberos+for+Ranger+0.5 Just like Storm you need to enable for Kerberos Then enable the plug in manually -> https://cwiki.apache.org/confluence/display/RANGER/Apache+Ranger+0.5.0+Installation#ApacheRanger0.5.0Installation-EnablingRangerSolrPlugin

amcbarnett · ‎02-02-2016

Your user bigotes may not have the privelege to grant select on that object. Do the following: > Su - hive > hive hive> grant select on TABLE logs to USER hue;

amcbarnett · ‎02-02-2016

@Saurabh Kumar Good question. I don't think there is a way to disable the "kill application" button. Perhaps if a user is granted read only permission to the Dashboard in Ambari -> Manage Ambari, he might be unable to do so, but I also see that even a read only user can access the RM UI. So the only way is to not give that user any permission to the dashboard or to the RM UI via Knox secured by Ranger. Perhaps someone else has a better answer.

Online	Offline
Last Visited	‎04-13-2018 03:07 PM

Member Since	‎09-29-2015 05:35 PM
Last Visited	‎04-13-2018 03:07 PM
Posts	286
Kudos received	595

Cloudera Community

Re: HIVE : counting null values based on group by

Re: ERROR 500 received - when installing the PIVOT...

Re: How do you achieve high availability in HDFS w...

Re: Why can't we use LDAP for Hadoop authenticatio...

Re: Error Installing HDB HAWQ Standby Master

Re: Best Practices for Storm Deployment on a Hadoo...

Demystify Apache Tez Memory Tuning - Step by Step

Re: While importing the virtual appliance using Or...

Re: HDInsight Component Comparison to HDP

Re: HDInsight Deployment Best Practices

Re: HBase MapReduce job

Re: Port Configuration in HDP

Re: Enable ranger for solr

Re: hive error GRANT PERMISSIONS

Re: Is there any way to disable "kill application"...