About alex.behm

thewayofthinkin · ‎01-06-2016

thx 🙂

thewayofthinkin · ‎01-06-2016

Alex, Thank you very much. 🙂 -- Moonwon (Gatsby) Lee gatsbylee.com "Life isn't about waiting for the storm to pass, it's about learning to dance in the rain."

thewayofthinkin · ‎01-04-2016

Thx. 🙂

alex.behm · ‎12-29-2015

Impala does not have control of the physical locations of the HDFS blocks underlying Impala tables. The tables in Impala are backed by files on HDFS and those files are chopped into blocks and distributed according to your HDFS configuration, but for all practical purposes the blocks are distributed round-robin among the data nodes (grossly simplified). Impala queries typically run on all data nodes that store data relevant to answering a parcitular query, so given a fixed amount of data, you can indirectly control Impala's degree of (inter-node) parallelism by changing the HDFS block size. More blocks == more parallelism. If you are interested in learning about Impala, you may also find the CIDR paper useful: http://www.cidrdb.org/cidr2015/Papers/CIDR15_Paper28.pdf

dnewberger · ‎12-17-2015

That's what I was suspecting would be the answer after talking with a coworker, re-reading a Cloudera blog post, and experimentation Hive. I'll have to check into kite.

saravana · ‎10-30-2015

useNativeQuery option is an workaround for this problem. Filed this as a bug in Jira IMPALA-2609.

alex.behm · ‎10-04-2015

Thanks for the notice, it could be an oversight with the docs. I'll look into it.

alex.behm · ‎08-10-2015

Hi Tom, due to other complications, I'm afraid that patch didn't make it into CDH 5.4.4, but we will include it in CDH 5.4.5 which is tentatively scheduled for the beginning of September. Thanks for your patience, and my apologies that the fix did not make it into CDH 5.4.4. Alex

alex.behm · ‎07-29-2015

Since the differences in the two systems are due to their implementation, I'd say you have the following options: 1. Use a differnet type, e.g., STRING. When concerting from STRING to TIMESTAMP you will encounter the same issues though. 2. Change your ingestion pipeline to enforce a timestamp range that is valid in both systems. This assumes that your a date with year 0000 would be considered "garbage" by your application. 3. Live with the fact that a NULL timestamp could mean it is out of range. May I ask what exactly is causing the headache? The fact that both systems return different results or the fact that for your application the year 0000 is a meaningful date?

alex.behm · ‎07-17-2015

Thanks for the update. I can reproduce the issue, but only when the target partition is empty. As soon as I add some data, compute incremental stats works as expected. So I'm still thinking you are hitting an edge case with an empty partition?

Online	Offline
Last Visited	‎05-10-2018 06:52 PM

Member Since	‎10-16-2013 11:04 AM
Last Visited	‎05-10-2018 06:52 PM
Posts	307
Kudos received	77

Cloudera Community

Re: External Table from Parquet folder returns emp...

Re: Impala SQL for KUDU does not work

Re: Impalad logs diskspace full

Re: Impala round function does not return expected...

Re: Is Impala a proces engine when I use kudu?

Re: Is there any Impala SQL command which can remo...

Re: what does this mean? - ( overflow when multipl...

Re: How to predict how much memory catalogd needs?

Re: How to distribute impala table partitions

Re: Problems moving data between csv and avro tabl...

Re: Impala JDBC driver does not parse table aliase...

Re: Impala-Shell vs Impala in Hue

Re: ERROR FAILED: SemanticException Class not foun...

Re: How to handle out of range timestamps for impa...

Re: Impala won't update stats on Hive Avro table