Member since
10-16-2013
307
Posts
77
Kudos Received
59
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
10342 | 04-17-2018 04:59 PM | |
5264 | 04-11-2018 10:07 PM | |
3047 | 03-02-2018 09:13 AM | |
20416 | 03-01-2018 09:22 AM | |
2234 | 02-27-2018 08:06 AM |
05-17-2016
08:54 AM
Hi! the reason for the speed difference between your two queries is not how many rows are returned, but how many columns are accessed by the query. Since Parquet is a columnar format, the number of accessed columns makes a huge difference to performance. Your first query does "select *", but your second query only accesses "colA" and "colB". I suspect that when you change your queries to access the same number of columns, then you won't see a big speed difference the two. Alex
... View more
05-17-2016
12:32 AM
Sorry you are running into this issue. The check you are hitting is a conservative safety precaution against OOMing in the JVM when serializing an array >1GB. See IMPALA-2648/IMPALA-2664 for details. We use (#columns * #partitions * 400) to estimate the in-memory size of the incremental stats, so in the short term the only way to avoid this limit is to reduce the number of columns or partitions such that the estimated in-memory size is below the 200MB threshold. I agree with you that ideally, we should make the limit configurable, or make changes to dramatically increase it. Filed: https://issues.cloudera.org/browse/IMPALA-3552
... View more
03-09-2016
03:57 PM
Unless you have a very strong reason to try a different approach I'd recommend the star schema model. The benefit is mostly that this model is so prevalent that I'd expect the integration with third party tools to be more smooth than with other "fancier" approaches.
... View more
03-08-2016
11:06 PM
Hi Suresh, that solution seems fine to me. Changing the location of a single table with ALTER is atomic, but you won't be able to atomically change the locations of two tables simultaneously. Just something to be aware of. Alex
... View more
03-07-2016
11:18 PM
Hi Suresh, even if your use case may be slightly different, I'd recomment you take a look at this blog post that presents best practices and may give you a few ideas: http://blog.cloudera.com/blog/2015/11/how-to-ingest-and-query-fast-data-with-impala-without-kudu/ Alex
... View more
03-02-2016
06:21 PM
Indeed there is, and the issue has been fixed! See: https://issues.cloudera.org/browse/IMPALA-2974 You can work around that specific problem in Impala by using the following syntax. ALTER TABLE A REPLACE COLUMNS (complete list of column definitions) My apologies for the inconvenience.
... View more
02-28-2016
10:09 PM
Hi Lloyd, you need to alter the type of the top-level column, i.e.,: ALTER TABLE existing_parquet CHANGE COLUMN address address STRUCT<:street:STRING,city:STRING,house_no INTEGER> Unfortunately, there currently is no other way to make alterations to nested fields. Alex
... View more
02-23-2016
08:28 AM
Thanks for reporting this issue! I agree that it is an unfortunate limitation. FWIW, this limitation is explained here in the docs here: http://www.cloudera.com/documentation/enterprise/latest/topics/impala_complex_types.html#complex_types_limits_unique_2 "Currently, Impala built-in functions and user-defined functions cannot accept complex types as parameters or produce them as function return values"
... View more
02-03-2016
04:30 PM
1 Kudo
I agree that it is an important usability concern. I apologize for the inconvenience of having to track that information manually. We filed the following JIRA to track progress on the issue. Thanks for your feedback! https://issues.cloudera.org/browse/IMPALA-2942
... View more
02-03-2016
03:04 PM
I'm afraid it is currently not possible to determine the last time compute stats was run. One possible workaround is to count(*) the current table and compare it with the numRows recorded in the stats. That does not yield the last time stats were computed, but it shows by how many rows the table has grown. Depending on the ingest pattern, you might also use the lastDdlTime in the table and/or partition properties to get an estimate.
... View more