Member since
11-27-2017
32
Posts
1
Kudos Received
0
Solutions
08-29-2019
02:11 AM
It only supports on table stats but not on per partitions stats (incremental stats), it says in your link: "For non-incremental COMPUTE STATS statement, the columns for which statistics are computed can be specified with an optional comma-separated list of columns." So it looks like column specific is only on a table without partitions (non-incremental) It really strange that it works only in this way
... View more
08-29-2019
12:37 AM
@EricL @eMazarakis We have tables with lots of non-filtered columns, so I know we don't want to collect statistics on them. Impala docs say that: "For a table with a huge number of partitions and many columns, the approximately 400 bytes of metadata per column per partition can add up to significant memory overhead, as it must be cached on the CatalogD host and on every ImpalaD host that is eligible to be a coordinator. If this metadata for all tables combined exceeds 2 GB, you might experience service downtime." so for me, it's strange user don't have options to minimize the statistics on tables. Hive has this option but if I use it it won't sync to Impala: "If you run the Hive statement ANALYZE TABLE COMPUTE STATISTICS FOR COLUMNS, Impala can only use the resulting column statistics if the table is unpartitioned. Impala cannot use Hive-generated column statistics for a partitioned table."
... View more
08-27-2019
07:27 AM
Thanks @eMazarakis but I mean stats on specific partitions AND specific columns (BOTH). If I'll run as you suggest It will collect statistics on all the columns which we don't want, So how can I collect stats on specific partitions AND specific columns?
... View more
08-21-2019
08:17 AM
Hi, I want to gather stats on a big partition table, but want to do it only on some of the partitions and not on all the columns because it can take lots of data. I don't see in the documentation of "compute incremental stats" option to do it, How can I run stats only on some of the partitions and some/none of the columns? Thanks
... View more
Labels:
- Labels:
-
Apache Impala
03-27-2019
12:14 AM
Hi, It's an internal table with 4 levels of partitions, But still, when I remove the partition the data and metadata is deleted but not the folder. Thanks
... View more
03-25-2019
12:49 AM
Hi, We have CDH 5.15 and when we drop partition in Hive it removes the partition and its data but the folder of the partition remains empty and is not removed. How can I change this behavior? Thanks
... View more
Labels:
- Labels:
-
Apache Hive
03-14-2019
11:56 AM
Hi @Tim Armstrong , This is the output of SHOW FILES on the specific partition the query failed on (it failed on) hdfs://HadoopCluster/user/database/table_name/partition_value=KS5021/part-m-00000.snappy 2.74GB partition_value=KS5021 hdfs://HadoopCluster/user/database/table_name/partition_value=KS5021/part-m-00001.snappy 3.20GB partition_value=KS5021 hdfs://HadoopCluster/user/database/table_name/partition_value=KS5021/part-m-00002.snappy 3.55GB partition_value=KS5021 hdfs://HadoopCluster/user/database/table_name/partition_value=KS5021/part-m-00003.snappy 3.19GB partition_value=KS5021 This is the version: impalad version 2.12.0-cdh5.15.1 RELEASE (build 64f4e19bf59fab8664ebff7e80fc70570dcd8cb8) Built on Thu Aug 9 09:21:02 PDT 2018 Thanks
... View more
03-13-2019
09:20 AM
Hi, When running a query on one of our tables I'm getting this error: Disk I/O error: Error seeking to -2147483648 in file: hdfs://Cluster/path/table_name/partition_name=KS5021/part-m-00003.snappy: Error(255): Unknown error 255 Root cause: EOFException: Cannot seek to negative offset. Also in the impalad log, I getting this error in the log: tcmalloc: large alloc 3428466688 bytes == 0x110fa000 @ 0x2210458 0xb5e878 0x10586c1 0x10583b4 0xe0633c 0xe064a7 0xe0366b 0xe04d8c 0xe06b52 0xde3954 0xdd45e5 0xdd5ef2 0xd5fdaf 0xd605aa 0x12d7dba The table is text table and this file size is about 3G. What is the problem? Thanks
... View more
Labels:
- Labels:
-
Apache Impala
03-06-2019
01:54 AM
We don't have any critical issues. We just saw in other systems (Cassandra, Kafka etc) that G1GC brought better performance and fewer problems so we thought to use it also for CDH, but I see from your answer it is not a big change. Thanks!
... View more
03-05-2019
05:06 AM
Hi, We have CDH 5.15 on our production cluster with JDK 8, What is your recommendation about moving to G1GC or stay with the CMS, didn't find any clear instruction about it. Does it depend on the CDH version? or on the components I'm using? Thanks
... View more
Labels:
- Labels:
-
Manual Installation