Member since
10-07-2015
21
Posts
1
Kudos Received
0
Solutions
01-14-2016
08:54 AM
I ran a select query with predicate on a non primary key column. If I specify the higher value first, and then the lower value, select returns zero rows. Not sure if this is a bug or it is working as expected. Query: select count(ss_item_sk) from store_sales_1p where ss_sold_date_sk between 2450817 and 2450816 +-------------------+ | count(ss_item_sk) | +-------------------+ | 0 | +-------------------+ Fetched 1 row(s) in 1.58s Query: select count(ss_item_sk) from store_sales_1p where ss_sold_date_sk between 2450816 and 2450817 +-------------------+ | count(ss_item_sk) | +-------------------+ | 17212 | +-------------------+ Fetched 1 row(s) in 1.52s This happens because of the way predicates are pushdown. between x and y, is convereted to >x and <y irrespective of x & y. Regards, Bhaskar
... View more
Labels:
- Labels:
-
Apache Impala
12-10-2015
05:31 AM
Sorry for late response. I am not looking at any particular use case. Just trying to see how the impala query is executed if the data is distributed across multiple hdfs data nodes. Its an experimental setup, so performance currently is irrelevant, its more for getting in-depth understanding. In the query execution plan I want to observe SCAN_HDFS and AGGREGATION. Regards, Bhaskar
... View more
12-03-2015
07:49 PM
Thanks a lot for the reply. Is there some argument/parameter I can specify with create table in impala to ensure HDFS distributes data blocks across multiple data nodes? If not, how do I do this? Regards, Bhaskar p.s. just getting started with hdfs/impala/hadoop/kudu..
... View more
12-02-2015
06:41 PM
Is there a way to distribute impala table partitions onto multiple hdfs data nodes without replication? Regards, Bhaskar
... View more
Labels:
- Labels:
-
Apache Impala
-
HDFS