Created on 03-29-2017 06:00 AM - edited 09-16-2022 04:21 AM
Hi,
I had a cluster[CDH 5.8.2] in which I was using Impala and Kudu.
Impala Parcel is downloaded from - http://archive.cloudera.com/beta/impala-kudu/parcels/latest/
I have upgraded this cluster to CDH 5.10 with cloudera manager 5.10.
Now, running the select verison() query on this upgraded cluster in Impala gives me below details;
impalad version 2.7.0-cdh5.10.0 RELEASE
However, in CDH 5.10 upgrade they mentioned the support for Imapal 2.8. I can not find the parcel for the Impala 2.8.
Also, running the delete command on Impala table give me below error.
"ERROR: AnalysisException: Impala does not support modifying a non-Kudu table: default.impala_testtable"
Questions:
1. Can anybody suggest me how can I upgrade to Impala 2.8? Is there any parcel for the same or the one which I'm currently using is the latest?
2. As running delete command on Impala table gives me the error what is the alternative to delete data from existing impala table? However, the delete command works fine with Kudu tables.
Can anybody please help me on the same.
Thanks,
Amit
Created 03-29-2017 06:42 AM
Hi Amit,
Your first question has already been discussed in this thread:
There's a bit of a story there. When we started preparing the 5.10 CDH release, the Apache 2.8 Impala release was not ready, so we had to call it "Impala 2.7" in the version number. Impala 2.8 was officially released after we finished putting together the CDH5.10 release - too late to bump the version in all places.
CDH5.10 Impala is almost exactly the same as 2.8, plus or minus a few patches, so in most of the announcements we've just called it 2.8.
You can find a full list of commits in CDH5.10.0 here: https://github.com/cloudera/Impala/commits/cdh5-2.7.0_5.10.0
The full list of commits in Impala 2.8 are here: https://github.com/apache/incubator-impala/commits/branch-2.8.0
To your second question: Impala does indeed not support the DELETE command for non-Kudu tables. You can use the TRUNCATE command to completely delete all data in a table.
Cheers, Lars
Created 03-29-2017 06:42 AM
Hi Amit,
Your first question has already been discussed in this thread:
There's a bit of a story there. When we started preparing the 5.10 CDH release, the Apache 2.8 Impala release was not ready, so we had to call it "Impala 2.7" in the version number. Impala 2.8 was officially released after we finished putting together the CDH5.10 release - too late to bump the version in all places.
CDH5.10 Impala is almost exactly the same as 2.8, plus or minus a few patches, so in most of the announcements we've just called it 2.8.
You can find a full list of commits in CDH5.10.0 here: https://github.com/cloudera/Impala/commits/cdh5-2.7.0_5.10.0
The full list of commits in Impala 2.8 are here: https://github.com/apache/incubator-impala/commits/branch-2.8.0
To your second question: Impala does indeed not support the DELETE command for non-Kudu tables. You can use the TRUNCATE command to completely delete all data in a table.
Cheers, Lars
Created 03-30-2017 02:50 AM
Thanks Lars for your help.
On Impala DELETE, is their any specific reason to stop DELETE on Impala tables?
May be I'm missing something, but as I have the kudu service installed on my cluster. In cloudera Impala configuration, what is the difference between setting kudu service or selecting none as in both cases kudu queries works fine.
Thank you for your help.
Thanks,
Amit
Created 04-03-2017 06:59 AM
Hi Lars ,
I also installed CDH5.10.1 hoping to find Impala 2.8 with the fix for the compute stats on large partition table failing on exceeding the limit of 200M .
"The new configuration setting inc_stats_size_limit_bytes lets you reduce the load on the catalog server when running the COMPUTE INCREMENTAL STATS statement for very large tables"
Do you have a way how to resolve it ?
its a big issue when our big tables do not have statistics queries are running for long time and impala loss points here .
Also you need to update your documentation :
as Impala 2.8 is not in CDH 10 and its confusing .
Thanks
Alon
Created 04-03-2017 12:26 PM
Hi Lars ,
I also installed CDH 5.10.1 hoping to find Impala 2.8 with the new hint SORTBY(cols):
"
A new hint, SORTBY(cols), allows Impala INSERT operations on a Parquet table to produce optimized output files with better compressibility and a more compact range of min/max values within each data file.the fix for the compute stats on large partition table failing on exceeding the limit of 200M .
"
Thanks,
Gustavo
Created 04-03-2017 04:39 PM
CDH5.10 has essentially all of the Impala 2.8 improvements in it, as mentioned earlier in the thread.
Lars can confirm, but I don't believe that the "SORT BY" fix made it into either Impala 2.8 or CDH5.10, I think it got pushed out to the next release. I think the docs are incorrect: https://www.cloudera.com/documentation/enterprise/release-notes/topics/impala_new_features.html#new_...
Created 04-03-2017 04:44 PM
AlonEdi: the increment stats change should be in CDH5.10. Did you have trouble using it?
Created 04-04-2017 04:05 AM
Hi Tim ,
Same Problem .
We can not go to production with this problem .
Following example for table we have with 4560 partitions and 382 columns .
The Incremental statistics fail but full statistics succeed ( why ???)
BTW its empty tables .
It does not happen in CDH 5.4.3 and meen that we will need to downgrade our CDH version to support huge tables .
CDH 5.9
compute INCREMENTAL STATS test_partitions.dwh_events
ERROR: AnalysisException: Incremental stats size estimate exceeds 200.00MB. Please try COMPUTE STATS instead.
L STATS test_partitions.dwh_events;ent-1271.internal:21000] > COMPUTE INCREMENTAL
Query: compute INCREMENTAL STATS test_partitions.dwh_events
ERROR: AnalysisException: Incremental stats size estimate exceeds 200.00MB. Please try COMPUTE STATS instead.
t_partitions.dwh_events;bi-environment-1271.internal:21000] > COMPUTE STATS tes
Query: compute STATS test_partitions.dwh_events
+----------------------------------------------+
| summary |
+----------------------------------------------+
| Updated 4560 partition(s) and 382 column(s). |
+----------------------------------------------+
Fetched 1 row(s) in 263.45s
[gc-dp-pdpprd-data-04.c.bi-environment-1271.internal:21000] > select version();
Query: select version()
Query submitted at: 2017-04-04 10:43:49 (Coordinator: http://gc-dp-pdpprd-data-04:25000)
Query progress can be monitored at: http://gc-dp-pdpprd-data-04:25000/query_plan?query_id=eb4e9a2e3c6eca6c:242f978800000000
+-----------------------------------------------------------------------------------------+
| version() |
+-----------------------------------------------------------------------------------------+
| impalad version 2.7.0-cdh5.9.0 RELEASE (build 4b4cf1936bd6cdf34fda5e2f32827e7d60c07a9c) |
| Built on Fri Oct 21 01:07:22 PDT 2016 |
+-----------------------------------------------------------------------------------------+
Fetched 1 row(s) in 0.02s
CDH 5.10
compute INCREMENTAL STATS dwh.dwh_events
ERROR: AnalysisException: Incremental stats size estimate exceeds 200.00MB. Please try COMPUTE STATS instead.
[gc-test-impala28-02.c.bi-environment-1271.internal:21000] > COMPUTE STATS dwh.dwh_events;
Query: compute STATS dwh.dwh_events
+----------------------------------------------+
| summary |
+----------------------------------------------+
| Updated 4560 partition(s) and 382 column(s). |
+----------------------------------------------+
Fetched 1 row(s) in 219.37s
[gc-test-impala28-02.c.bi-environment-1271.internal:21000] > select version();
Query: select version()
Query submitted at: 2017-04-04 10:48:20 (Coordinator: http://gc-test-impala28-02:25000)
Query progress can be monitored at: http://gc-test-impala28-02:25000/query_plan?query_id=2f49dded87976155:3610426b00000000
+------------------------------------------------------------------------------------------+
| version() |
+------------------------------------------------------------------------------------------+
| impalad version 2.7.0-cdh5.10.1 RELEASE (build 876895d2a90346e69f2aea02d5528c2125ae7a32) |
| Built on Mon Mar 20 02:28:53 PDT 2017 |
+------------------------------------------------------------------------------------------+
Fetched 1 row(s) in 0.01s
Recommedation : It seems that Impala read all data to compute its statistics ,will be good to have estimate statistics between 0.xx - 100% .
so the process will run faster , less heavy and produce statitics that are close to real .
Thanks
Alon
Created 04-04-2017 04:32 AM
Hi Alon,
Have you tried the inc_stats_size_limit_bytes command line flag as suggested by Tim? It is supported on CDH5.10.0. Here's the full help text from impalad:
-inc_stats_size_limit_bytes (Maximum size of incremental stats the catalog
is allowed to serialize per table. This limit is set as a safety check,
to prevent the JVM from hitting a maximum array limit of 1GB (or OOM)
while building the thrift objects to send to impalads. By default, it's
set to 200MB) type: int64 default: 209715200
This should allow you to increase the limit you are hitting.
Cheers, Lars
Created 04-04-2017 05:03 AM
Thanks Lars ,
looked for this parameter as part of impala configuration.
Added it to impalad parameters and its working with out error 🙂
what this message mean :
WARNINGS: Too many partitions selected, doing full recomputation of incremental stats
did it compute all table partions or just ones without statistics ?
I use table without data so compute stats (without incremental) completed on the same time (previous post - 219.37 seconds)
[gc-test-impala28-02.c.bi-environment-1271.internal:21000] > COMPUTE INCREMENTAL STATS dwh.dwh_events;
Query: compute INCREMENTAL STATS dwh.dwh_events
+----------------------------------------------+
| summary |
+----------------------------------------------+
| Updated 4560 partition(s) and 382 column(s). |
+----------------------------------------------+
WARNINGS: Too many partitions selected, doing full recomputation of incremental stats
Fetched 1 row(s) in 262.02s
Thanks
Alon