About alex.behm

alex.behm · ‎10-06-2017

Thanks for following up with the solution. Sorry for the pain, I understand it's somewhat user unfriendly. The explanation for the current behavior goes like this: Column names are generally case insensitive from the Impala SQL perspective, but HDFS file paths are case sensitive. So it could cause confusion if you had paths like this in HDFS: YEAR=2000/MONTH=1 year=2000/month=1 Year=2000/Month=1 Are they different partitions? All the same partition? Can one partition point to multiple directories... You see where I am going :). It's just easier to accept one canonical casing.

alex.behm · ‎10-06-2017

Not sure if this is the problem, but you might try using lower case names in the HDFS path, i.e.: year=2017/month=8/day=2 instead of YEAR=2017/MONTH=8/DAY=2

alex.behm · ‎09-28-2017

You can do this: insert overwrite table1 partition(partition_key=1) select * from table1 where partition_key=1; This process should mostly work as you'd expect. However, there are few situations where this may cause problems: - If you run concurrent "refresh" or "invalidate metadata" commands against that table/partition until the insert is complete, some queries may see missing or dupicate data from that partition (fix via refresh after the insert). - Do not run concurrent "insert overwrite" against the same partition. You may end up with missing/dupicate data in that partition. If you can guarantee that the above two situations are not a problem for you, then insert overwrite should work just fine.

alex.behm · ‎09-07-2017

I'm afraid there is no such option today.

alex.behm · ‎09-07-2017

Nad1998, that's a different error - it means your 'products' table does not exist or is not visible to Impala (try running 'invalidate metadata products', then retry query).

alex.behm · ‎09-07-2017

Yes

alex.behm · ‎08-24-2017

I'm afraid Impala is not yet able to recognize that only two partitions need to be scanned. We're aware of the gap and that specific optimization is tracked by: https://issues.apache.org/jira/browse/IMPALA-2108 For now, you can manually rewrite your query as suggested in the JIRA as follows: select id, yyyymmdd, group_id, test from dwh.table where ((id='1a' and yyyymmdd=20170815 and group_id=1) OR (id='2b' and yyyymmdd=20170811 and group_id=2)) AND ((yyyymmdd=20170811 and group_id=2) OR (yyyymmdd=20170815 and group_id=1)) or alternatively, use a union: select id, yyyymmdd, group_id, test from dwh.table where id='1a' and yyyymmdd=20170815 and group_id=1 union all select id, yyyymmdd, group_id, test from dwh.table where id='2b' and yyyymmdd=20170811 and group_id=2

alex.behm · ‎08-23-2017

https://issues.apache.org/jira/browse/IMPALA-1570 That feature is available since Impala 2.8 (CDH 5.11)

alex.behm · ‎06-19-2017

Hdfs does not know about partitions. That information is stored in the Hive Metastore as part of the other table metadata. A partition of a Impala/Hive table points to a directory in Hdfs. The values of partition columns are not stored in data files, they are "stored" in the Hdfs directory structure, e.g. hdfs://warehouse/mytable/year=2017/month=6 might be a directory of a partitioned table "mytable" with partition columns year and month.

alex.behm · ‎06-14-2017

Yes, very likely there will be a performance difference, but it's hard to say which one will be better without concrete examples.

Online	Offline
Last Visited	‎05-10-2018 06:52 PM

Member Since	‎10-16-2013 11:04 AM
Last Visited	‎05-10-2018 06:52 PM
Posts	307
Kudos received	77

Cloudera Community

Re: External Table from Parquet folder returns emp...

Re: Impala SQL for KUDU does not work

Re: Impalad logs diskspace full

Re: Impala round function does not return expected...

Re: Is Impala a proces engine when I use kudu?

Re: Impala Failing to Recognize Partitioning

Re: Impala Failing to Recognize Partitioning

Re: combine small parquet files

Re: Impala Case-Insensitive query

Re: com.cloudera.impala.common.AnalysisException: ...

Re: Insert overwrite partitioned table

Re: Impala query to scan two records in different ...

Re: DROP / COMPUTE incremental stats with dynamic ...

Re: query on partition question

Re: query on partition question