Member since
10-07-2015
21
Posts
1
Kudos Received
0
Solutions
08-14-2017
07:12 AM
Hello Can you please help me with this query. i want to get one month data of june but i am getting errors. where `dm_capacity`.`measured_at` >= 6/31/2017 - interval 1 month
... View more
12-29-2015
03:34 PM
Impala does not have control of the physical locations of the HDFS blocks underlying Impala tables. The tables in Impala are backed by files on HDFS and those files are chopped into blocks and distributed according to your HDFS configuration, but for all practical purposes the blocks are distributed round-robin among the data nodes (grossly simplified). Impala queries typically run on all data nodes that store data relevant to answering a parcitular query, so given a fixed amount of data, you can indirectly control Impala's degree of (inter-node) parallelism by changing the HDFS block size. More blocks == more parallelism. If you are interested in learning about Impala, you may also find the CIDR paper useful: http://www.cidrdb.org/cidr2015/Papers/CIDR15_Paper28.pdf
... View more