Member since
01-13-2017
40
Posts
0
Kudos Received
2
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
5938 | 04-07-2017 05:41 AM | |
1942 | 02-22-2017 06:51 AM |
10-19-2017
07:15 PM
there are couple of places that needsd tuining in the query level 1 . stats for the table is must for good performance 2. when user is joining two tables make sure there are using the large table in the last and the first table is smaller 3. you can also use HINTS to imporve query performance. 4. hive table's file format is big a factor 5. choosing when to use paritioning vs bucketing. 6.allocate good memory to hiveserver2 and metastore 7.heapsize 8 .load balancer on the host https://www.cloudera.com/documentation/enterprise/5-9-x/topics/admin_cm_ha_hosts.html#concept_qkr_bfd_pr
... View more
04-07-2017
05:41 AM
In fact the issue was that hbase was not installed.
... View more
03-31-2017
02:52 PM
1 Kudo
CANNOT FIND ADDRESS for executors can come: When executor either been killed (by yarn for example) or removed by dynamic allocation. It just means that the executor is not there anymore. That message doesn't mean there is anything wrong.
... View more
02-22-2017
06:51 AM
Issuse has been fixed by replacing ODBC version 2.5.22 by 2.5.36
... View more
01-26-2017
08:16 AM
I've set-up hive.prewarm.enabled=true and it did not improve the slow latency to start and initialize executors. It still takes about 15seconds to initialize things. Any idea ?
... View more
01-17-2017
11:50 AM
On the setting changes, stats, as stated will help with counts as that info is precalculates and stored in the metadata. The CBO and stats also help a lot with joins. It is possible that the OS cache is more to do with the improvement if this was a subsequent run with little activity. You could look at Hive on Spark for better consistent performance. Set hive.execution.engine = spark; On the times, the big impact between job submission and start is the the scheduler. That is a deep topic. It is best if you read up on them and review your settings and ask any specific questions that come up, preferably in a new topic. The other factor, not captured on the job stats, is the time it takes to return the results to the client. This will vary depending on the client and there isn't much to do about it. In general small result sets can be handle by the hive CLI. You can increase the client heap if needed. Otherwise use HS2 connections like beeline or HUE.
... View more
01-16-2017
10:25 PM
1 Kudo
Yes. Cloudera does not support Tez on any CDH version. Hence they do not ship the Tez jar and have it in the classpath. It will take quite a bit of work to build tez and maintain it with each CDH release. Here is a link if you are up to it. Otherwise be satisfied with Hive on Spark or Impala. https://gist.github.com/epiphani/dd37e87acfb2f8c4cbb0
... View more