About Bsinghal

Bsinghal · ‎01-14-2016

I ran a select query with predicate on a non primary key column. If I specify the higher value first, and then the lower value, select returns zero rows. Not sure if this is a bug or it is working as expected. Query: select count(ss_item_sk) from store_sales_1p where ss_sold_date_sk between 2450817 and 2450816 +-------------------+ | count(ss_item_sk) | +-------------------+ | 0 | +-------------------+ Fetched 1 row(s) in 1.58s Query: select count(ss_item_sk) from store_sales_1p where ss_sold_date_sk between 2450816 and 2450817 +-------------------+ | count(ss_item_sk) | +-------------------+ | 17212 | +-------------------+ Fetched 1 row(s) in 1.52s This happens because of the way predicates are pushdown. between x and y, is convereted to >x and <y irrespective of x & y. Regards, Bhaskar

Bsinghal · ‎12-10-2015

Sorry for late response. I am not looking at any particular use case. Just trying to see how the impala query is executed if the data is distributed across multiple hdfs data nodes. Its an experimental setup, so performance currently is irrelevant, its more for getting in-depth understanding. In the query execution plan I want to observe SCAN_HDFS and AGGREGATION. Regards, Bhaskar

Bsinghal · ‎12-03-2015

Thanks a lot for the reply. Is there some argument/parameter I can specify with create table in impala to ensure HDFS distributes data blocks across multiple data nodes? If not, how do I do this? Regards, Bhaskar p.s. just getting started with hdfs/impala/hadoop/kudu..

Bsinghal · ‎12-02-2015

Is there a way to distribute impala table partitions onto multiple hdfs data nodes without replication? Regards, Bhaskar

Online	Offline
Last Visited	‎11-27-2017 11:43 PM

Member Since	‎10-07-2015 03:37 AM
Last Visited	‎11-27-2017 11:43 PM
Posts	21

Cloudera Community

impala select with between clause

Re: How to distribute impala table partitions

Re: How to distribute impala table partitions

How to distribute impala table partitions