Member since
10-16-2013
307
Posts
77
Kudos Received
59
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
11279 | 04-17-2018 04:59 PM | |
6223 | 04-11-2018 10:07 PM | |
3572 | 03-02-2018 09:13 AM | |
22344 | 03-01-2018 09:22 AM | |
2672 | 02-27-2018 08:06 AM |
10-06-2017
04:19 PM
1 Kudo
Thanks for following up with the solution. Sorry for the pain, I understand it's somewhat user unfriendly. The explanation for the current behavior goes like this: Column names are generally case insensitive from the Impala SQL perspective, but HDFS file paths are case sensitive. So it could cause confusion if you had paths like this in HDFS: YEAR=2000/MONTH=1 year=2000/month=1 Year=2000/Month=1 Are they different partitions? All the same partition? Can one partition point to multiple directories... You see where I am going :). It's just easier to accept one canonical casing.
... View more
10-06-2017
10:04 AM
Not sure if this is the problem, but you might try using lower case names in the HDFS path, i.e.: year=2017/month=8/day=2 instead of YEAR=2017/MONTH=8/DAY=2
... View more
09-28-2017
12:45 PM
1 Kudo
You can do this: insert overwrite table1 partition(partition_key=1) select * from table1 where partition_key=1; This process should mostly work as you'd expect. However, there are few situations where this may cause problems: - If you run concurrent "refresh" or "invalidate metadata" commands against that table/partition until the insert is complete, some queries may see missing or dupicate data from that partition (fix via refresh after the insert). - Do not run concurrent "insert overwrite" against the same partition. You may end up with missing/dupicate data in that partition. If you can guarantee that the above two situations are not a problem for you, then insert overwrite should work just fine.
... View more
09-07-2017
09:22 PM
1 Kudo
Nad1998, that's a different error - it means your 'products' table does not exist or is not visible to Impala (try running 'invalidate metadata products', then retry query).
... View more
08-24-2017
04:01 PM
I'm afraid Impala is not yet able to recognize that only two partitions need to be scanned. We're aware of the gap and that specific optimization is tracked by: https://issues.apache.org/jira/browse/IMPALA-2108 For now, you can manually rewrite your query as suggested in the JIRA as follows: select id, yyyymmdd, group_id, test from dwh.table where ((id='1a' and yyyymmdd=20170815 and group_id=1) OR (id='2b' and yyyymmdd=20170811 and group_id=2)) AND ((yyyymmdd=20170811 and group_id=2) OR (yyyymmdd=20170815 and group_id=1)) or alternatively, use a union: select id, yyyymmdd, group_id, test from dwh.table where id='1a' and yyyymmdd=20170815 and group_id=1 union all select id, yyyymmdd, group_id, test from dwh.table where id='2b' and yyyymmdd=20170811 and group_id=2
... View more
08-23-2017
08:59 PM
2 Kudos
https://issues.apache.org/jira/browse/IMPALA-1570 That feature is available since Impala 2.8 (CDH 5.11)
... View more
06-19-2017
09:22 AM
Hdfs does not know about partitions. That information is stored in the Hive Metastore as part of the other table metadata. A partition of a Impala/Hive table points to a directory in Hdfs. The values of partition columns are not stored in data files, they are "stored" in the Hdfs directory structure, e.g. hdfs://warehouse/mytable/year=2017/month=6 might be a directory of a partitioned table "mytable" with partition columns year and month.
... View more
06-14-2017
06:04 PM
Yes, very likely there will be a performance difference, but it's hard to say which one will be better without concrete examples.
... View more