About hores

AbelUD · ‎02-02-2022

The problem we have is in "Last row fetched", it is the longest time to execute a query. This slows down response times. Can you help us with some information to optimize these times. For example: Rows available: 510ms (499ms) First row fetched: 1.97s (1.46s) Last row fetched: 1.97s (57.35us) we have Impala version 2.12. cdh5.16.2

Tim Armstrong · ‎09-20-2019

So it looks like column specific is only on a table without partitions (non-incremental) @hores that's incorrect, non-incremental compute stats works on partitioned tables and is generally the preferred method for collecting stats on partitioned tables. We've generally tried to steer people away from incremental stats because of the size issues on large tables, It would also be error-prone to use correctly and complex to implement - what happens if you compute incremental stats with different subsets of the columns? You can end up with different subsets of the columns on different partitions and then you have to somehow reconcile it all each time.

EricL · ‎03-31-2019

Hi, I assume that you work on the managed table instead of external table? This could be because of lack of permissions from the user who tried to run the DROP command to remove the underlining HDFS path. Check HMS server log to see if you can find any error messages.

hores · ‎03-14-2019

Hi @Tim Armstrong , This is the output of SHOW FILES on the specific partition the query failed on (it failed on) hdfs://HadoopCluster/user/database/table_name/partition_value=KS5021/part-m-00000.snappy 2.74GB partition_value=KS5021 hdfs://HadoopCluster/user/database/table_name/partition_value=KS5021/part-m-00001.snappy 3.20GB partition_value=KS5021 hdfs://HadoopCluster/user/database/table_name/partition_value=KS5021/part-m-00002.snappy 3.55GB partition_value=KS5021 hdfs://HadoopCluster/user/database/table_name/partition_value=KS5021/part-m-00003.snappy 3.19GB partition_value=KS5021 This is the version: impalad version 2.12.0-cdh5.15.1 RELEASE (build 64f4e19bf59fab8664ebff7e80fc70570dcd8cb8) Built on Thu Aug 9 09:21:02 PDT 2018 Thanks

hores · ‎03-06-2019

We don't have any critical issues. We just saw in other systems (Cassandra, Kafka etc) that G1GC brought better performance and fewer problems so we thought to use it also for CDH, but I see from your answer it is not a big change. Thanks!

Consult · ‎03-06-2019

Hello @hores, I am not sure about your use case scenario to opt for g1. But as per most of the GC test, cms still seems to the best and default option. G1 may have improved latency but throughput is still a challenge in many tested scenario. Also, you may review tunning GC for HBase. Hope that helps.

hores · ‎01-02-2018

It's stuck in a CREATED state (if I remember correctly at one time I could get to the daemon page), I don't remember about other queries cause when I've checked only this query was on the daemon and the daemon was page stuck in my browser. My daemon has exactly 16GB. Thanks, I'll try that, there any other tracing tool I can use to check the daemon or the query?

hores · ‎12-31-2017

Hi, Ok, we will check that. Mauricio, thanks for the detailed answer!

Online	Offline
Last Visited	‎10-27-2019 03:30 AM

Member Since	‎11-27-2017 05:05 AM
Last Visited	‎10-27-2019 03:30 AM
Posts	32
Kudos received	1

Cloudera Community

Re: Impala "Rows available" in Query Timeline

Re: Impala compute incremental stats on specific c...

Re: Drop partition remove data but not HDFS folder

Re: Impala query getting EOFException: Cannot seek...

Re: Using G1GC of JDK 8 on Cloudera

Re: Using G1GC of JDK 8 on Cloudera

Re: Debugging query in Impala

Re: Locking in Impala