About hores

hores · ‎08-29-2019

It only supports on table stats but not on per partitions stats (incremental stats), it says in your link: "For non-incremental COMPUTE STATS statement, the columns for which statistics are computed can be specified with an optional comma-separated list of columns." So it looks like column specific is only on a table without partitions (non-incremental) It really strange that it works only in this way

hores · ‎08-29-2019

@EricL @eMazarakis We have tables with lots of non-filtered columns, so I know we don't want to collect statistics on them. Impala docs say that: "For a table with a huge number of partitions and many columns, the approximately 400 bytes of metadata per column per partition can add up to significant memory overhead, as it must be cached on the CatalogD host and on every ImpalaD host that is eligible to be a coordinator. If this metadata for all tables combined exceeds 2 GB, you might experience service downtime." so for me, it's strange user don't have options to minimize the statistics on tables. Hive has this option but if I use it it won't sync to Impala: "If you run the Hive statement ANALYZE TABLE COMPUTE STATISTICS FOR COLUMNS, Impala can only use the resulting column statistics if the table is unpartitioned. Impala cannot use Hive-generated column statistics for a partitioned table."

hores · ‎08-27-2019

Thanks @eMazarakis but I mean stats on specific partitions AND specific columns (BOTH). If I'll run as you suggest It will collect statistics on all the columns which we don't want, So how can I collect stats on specific partitions AND specific columns?

hores · ‎08-21-2019

Hi, I want to gather stats on a big partition table, but want to do it only on some of the partitions and not on all the columns because it can take lots of data. I don't see in the documentation of "compute incremental stats" option to do it, How can I run stats only on some of the partitions and some/none of the columns? Thanks

hores · ‎03-27-2019

Hi, It's an internal table with 4 levels of partitions, But still, when I remove the partition the data and metadata is deleted but not the folder. Thanks

hores · ‎03-25-2019

Hi, We have CDH 5.15 and when we drop partition in Hive it removes the partition and its data but the folder of the partition remains empty and is not removed. How can I change this behavior? Thanks

hores · ‎03-14-2019

Hi @Tim Armstrong , This is the output of SHOW FILES on the specific partition the query failed on (it failed on) hdfs://HadoopCluster/user/database/table_name/partition_value=KS5021/part-m-00000.snappy 2.74GB partition_value=KS5021 hdfs://HadoopCluster/user/database/table_name/partition_value=KS5021/part-m-00001.snappy 3.20GB partition_value=KS5021 hdfs://HadoopCluster/user/database/table_name/partition_value=KS5021/part-m-00002.snappy 3.55GB partition_value=KS5021 hdfs://HadoopCluster/user/database/table_name/partition_value=KS5021/part-m-00003.snappy 3.19GB partition_value=KS5021 This is the version: impalad version 2.12.0-cdh5.15.1 RELEASE (build 64f4e19bf59fab8664ebff7e80fc70570dcd8cb8) Built on Thu Aug 9 09:21:02 PDT 2018 Thanks

hores · ‎03-13-2019

Hi, When running a query on one of our tables I'm getting this error: Disk I/O error: Error seeking to -2147483648 in file: hdfs://Cluster/path/table_name/partition_name=KS5021/part-m-00003.snappy: Error(255): Unknown error 255 Root cause: EOFException: Cannot seek to negative offset. Also in the impalad log, I getting this error in the log: tcmalloc: large alloc 3428466688 bytes == 0x110fa000 @ 0x2210458 0xb5e878 0x10586c1 0x10583b4 0xe0633c 0xe064a7 0xe0366b 0xe04d8c 0xe06b52 0xde3954 0xdd45e5 0xdd5ef2 0xd5fdaf 0xd605aa 0x12d7dba The table is text table and this file size is about 3G. What is the problem? Thanks

hores · ‎03-06-2019

We don't have any critical issues. We just saw in other systems (Cassandra, Kafka etc) that G1GC brought better performance and fewer problems so we thought to use it also for CDH, but I see from your answer it is not a big change. Thanks!

hores · ‎03-05-2019

Hi, We have CDH 5.15 on our production cluster with JDK 8, What is your recommendation about moving to G1GC or stay with the CMS, didn't find any clear instruction about it. Does it depend on the CDH version? or on the components I'm using? Thanks

Online	Offline
Last Visited	‎10-27-2019 03:30 AM

Member Since	‎11-27-2017 05:05 AM
Last Visited	‎10-27-2019 03:30 AM
Posts	32
Kudos received	1

Cloudera Community

Re: Impala compute incremental stats on specific c...

Re: Impala compute incremental stats on specific c...

Re: Impala compute incremental stats on specific c...

Impala compute incremental stats on specific colum...

Re: Drop partition remove data but not HDFS folder

Drop partition remove data but not HDFS folder

Re: Impala query getting EOFException: Cannot seek...

Impala query getting EOFException: Cannot seek to ...

Re: Using G1GC of JDK 8 on Cloudera

Using G1GC of JDK 8 on Cloudera