About MasterOfPuppets

csguna · ‎10-19-2017

there are couple of places that needsd tuining in the query level 1 . stats for the table is must for good performance 2. when user is joining two tables make sure there are using the large table in the last and the first table is smaller 3. you can also use HINTS to imporve query performance. 4. hive table's file format is big a factor 5. choosing when to use paritioning vs bucketing. 6.allocate good memory to hiveserver2 and metastore 7.heapsize 8 .load balancer on the host https://www.cloudera.com/documentation/enterprise/5-9-x/topics/admin_cm_ha_hosts.html#concept_qkr_bfd_pr

MasterOfPuppets · ‎04-07-2017

In fact the issue was that hbase was not installed.

neerjakhattar · ‎03-31-2017

CANNOT FIND ADDRESS for executors can come: When executor either been killed (by yarn for example) or removed by dynamic allocation. It just means that the executor is not there anymore. That message doesn't mean there is anything wrong.

MasterOfPuppets · ‎02-22-2017

Issuse has been fixed by replacing ODBC version 2.5.22 by 2.5.36

MasterOfPuppets · ‎01-26-2017

I've set-up hive.prewarm.enabled=true and it did not improve the slow latency to start and initialize executors. It still takes about 15seconds to initialize things. Any idea ?

mbigelow · ‎01-17-2017

On the setting changes, stats, as stated will help with counts as that info is precalculates and stored in the metadata. The CBO and stats also help a lot with joins. It is possible that the OS cache is more to do with the improvement if this was a subsequent run with little activity. You could look at Hive on Spark for better consistent performance. Set hive.execution.engine = spark; On the times, the big impact between job submission and start is the the scheduler. That is a deep topic. It is best if you read up on them and review your settings and ask any specific questions that come up, preferably in a new topic. The other factor, not captured on the job stats, is the time it takes to return the results to the client. This will vary depending on the client and there isn't much to do about it. In general small result sets can be handle by the hive CLI. You can increase the client heap if needed. Otherwise use HS2 connections like beeline or HUE.

MasterOfPuppets · ‎01-17-2017

Thanks for your feedback.

mbigelow · ‎01-16-2017

Yes. Cloudera does not support Tez on any CDH version. Hence they do not ship the Tez jar and have it in the classpath. It will take quite a bit of work to build tez and maintain it with each CDH release. Here is a link if you are up to it. Otherwise be satisfied with Hive on Spark or Impala. https://gist.github.com/epiphani/dd37e87acfb2f8c4cbb0

Online	Offline
Last Visited	‎04-14-2017 09:21 AM

Member Since	‎01-13-2017 01:06 PM
Last Visited	‎04-14-2017 09:21 AM
Posts	40

Cloudera Community

Re: x sRe: Issue: Create HBASE table on Hive

Re: Impala ODBC not returning data when WITH claus...

Re: Adding nodes will improve performance ?

Re: x sRe: Issue: Create HBASE table on Hive

Re: Spark : CANNOT FIND ADDRESS

Re: Impala ODBC not returning data when WITH claus...

Re: Hive hive.prewarm.enabled property

Re: Hive Queries run slowly

Re: Hive hive.exec.parallel property

Re: Tez Engine not working over CDH 5.8.2