About SQLShaw

SQLShaw · ‎04-02-2017

Hi @Bala Vignesh N V, Truncating the partition will not remove the partition metadata. To remove the metadata for all partitions you'll want to issue the CASCADE statement in an ALTER TABLE statement. This should remove the column metadata for all partitions. https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-ChangeColumnName/Type/Position/Comment

SQLShaw · ‎03-27-2017

Hi @Abhijeet Rajput, it is recommended to analyze all tables, ORC included, on a regular basis for performance. Statistics will be more valuable on larger tables than smaller tables. Sorting is not necessary and, in fact, sorting is not allowed on ACID tables. As of HDP 2.5, Hive uses both a rules based optimizer as well as a cost-based optimizer called Apache Calcite. Enabling the CBO will provide the best use of statistics. Also, you may want to take a look at LLAP which is TP in 2.5 and will be GA in 2.6. Hope this helps.

SQLShaw · ‎03-11-2017

@ccasano I'd also like to add to my comment that default LLAP leverages the LRFU algorithm with a pre-emption strategy on the Frequently Used side. This means that LLAP will always preempt long running queries in favor of short, adhoc queries yet it still will allow for the occasional "bring-back-all-the-data" scenario without flushing adhoc query cache. This allows for better query concurrency and provides optimal performance for the majority of BI workloads. In addition, LLAP does not use YARN containers which would limit concurrency since each container is a user session, aka job. Most BI use cases involve many users running ad hoc queries and then keeping their session open as they look at the reports. By leverages TEZ AM for queries LLAP gets much higher concurrency. The combination of LLAP's LRFU algorithm, use of TEZ AM, caching, and AtScale's Adaptive Cache, users get a nice boost in performance and concurrency out-of-the-box.

SQLShaw · ‎03-10-2017

Hi @ccasano Current limits are more tightly related to query performance because queries that take a long time can keep open threads which only serve to backup other users. So getting queries to Adaptive Cache and improving query performance is important. That being said, there will be some clients that will have enterprise grade concurrency requirements. When this is the case, the recommendation is to stand up additional AtScale nodes which will synchronize and have a load balancer in front. Of course this also assumes that the increased concurrency demand can be serviced by the HDP cluster leveraging the Adaptive Cache. Hope this helps.

SQLShaw · ‎03-01-2017

https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi.processors.standard.QueryDatabaseTable/

SQLShaw · ‎02-23-2017

Hi @James Dinkel I'm guessing there is a memory sizing issue. Make sure you follow these sizing rules: MemPerDaemon (Container Size) > LLAP Heapsize (Java process heap) + CacheSize (off heap) +headroom Multiple of yarn min allocation Should be less than yarn.nodemanager.resource.memory-mb Headroom is capped at 6GB QueueSize (Yarn Queue) >= MemPerDaemon * num daemons + slider + (tez AM size * concurrency) Cachesize = MemPerDaemon - (hive tez container * num of executors) Num executors per daemon = (MemPerDaemon - cache_size)/hive tez container size In addition, be sure your LLAP queue is setup appropriately and has sufficient capacity: <queue>.user_limit_factor =1 <queue>.ama-resource-percent =1 (its actually a factor between 0 and 1) <queue>.capacity=100 <queue>.max-capacity=100

SQLShaw · ‎01-24-2017

Hi @suresh krish It appears you may have a Ranger policy preventing access to the table. You can disable Ranger authentication through Ambari in the Hive configs or review the Hive Ranger policies and provide the appropriate authorization. This HCC thread has some additional information https://community.hortonworks.com/questions/64345/how-to-add-another-hiveserver-for-current-metastor.html

SQLShaw · ‎01-23-2017

Hi @Geetha Anne Our most recent release is 2.5.3. We provide support for a rolling window of the previous 2 versions. This means we still provide support for 2.3. HDP 2.3 was released on June 8, 2015.

SQLShaw · ‎01-18-2017

@Yasir Faiz I believe LLAP will be GA in 2.6 which is due out sometime in Q2. The advantage of HDP 2.5 would be to test LLAP and Hive 2.0. Also, HDP 2.5 also includes a number fixes to Hive 1.2.1 1.2.1 http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.5.3/bk_release-notes/content/fixed_issues.html

SQLShaw · ‎01-18-2017

@Yasir Faiz Hortonworks fully supports Spark but we do not support Hive on Spark. There are couple of reasons for this: 1. Performance: Hive on Tez showed a 50x or greater improvement than Hive on MR, while Hive on Spark only showed a 3x improvement over Hive on MR. 2. Scale: Hive on Tez has been proven to scale at data sets well exceeding what is capable by Hive on Spark. Here is a presentation discussing some of the differences: http://www.slideshare.net/hortonworks/hive-on-spark-is-blazing-fast-or-is-it-final With LLAP, Hive speed is increased even further but LLAP is not a replacement for Tez. In fact, LLAP still leverages the Tez engine. Hope this helps!

Online	Offline
Last Visited	‎06-25-2024 10:10 AM

Member Since	‎07-31-2019 06:56 AM
Last Visited	‎06-25-2024 10:10 AM
Posts	346
Kudos received	257

Cloudera Community

Re: Regarding to activate HIVE ACID transactions o...

Re: Hive 1.2.1++

Re: What is the fastest way to load data into Apac...

Re: Do i have to commit my insert statment in hive...

Re: Deploying hortonworks sandbox VM to cluster

Re: Bug: Partioning In Hive

Re: How important is to analyze ORC Hive table

Re: AtScale Concurrency

Re: AtScale Concurrency

Re: QueryDatabase Table keeps on fetching data fro...

Re: LLAP not using io cache

Re: HIVE show table throws error

Re: Support Questions

Re: Hive on Spark

Re: Hive on Spark