About alex.behm

alex.behm · ‎12-30-2015

Hi! your scenario should work. Did you do "invalidate metadata <table>" in Impala after computing the stats in Hive? Also, Impala only deals with column stats at the table level, so if you compute the column stats for a specific partition in Hive, then those stats will not show up in Impala.

alex.behm · ‎12-29-2015

Impala does not have control of the physical locations of the HDFS blocks underlying Impala tables. The tables in Impala are backed by files on HDFS and those files are chopped into blocks and distributed according to your HDFS configuration, but for all practical purposes the blocks are distributed round-robin among the data nodes (grossly simplified). Impala queries typically run on all data nodes that store data relevant to answering a parcitular query, so given a fixed amount of data, you can indirectly control Impala's degree of (inter-node) parallelism by changing the HDFS block size. More blocks == more parallelism. If you are interested in learning about Impala, you may also find the CIDR paper useful: http://www.cidrdb.org/cidr2015/Papers/CIDR15_Paper28.pdf

alex.behm · ‎12-16-2015

I'm afraid that Impala currently does not support writing Avro data. Even though you can enable experimental support, we strongly advise against that, and instead recomment using another tool such as Hive or Kite to do the conversion. My apologies for the inconvenience.

alex.behm · ‎10-26-2015

Hi Saravana, thanks for your report! Looks like a new isue. Would you mind filing a JIRA for it so we can assign and track it? As a possible workaround, could you try running the query with "use native query" enabled in the JDBC driver? The driver will send the query to Impala verbatim (sometimes the driver may make some changes to the SQL). http://www.cloudera.com/content/cloudera/en/documentation/connectors/latest/PDF/Cloudera-JDBC-Driver-for-Impala-Install-Guide.pdf Alex

alex.behm · ‎10-04-2015

Thanks for the notice, it could be an oversight with the docs. I'll look into it.

alex.behm · ‎10-02-2015

Impala currently does not support truncating an individual partition, so that syntax error you get from the shell is expected. I am not sure why the statement appears to work from Hue. The TRUNCATE TABLE ... PARTITION syntax is not supported, so I don't see how this would work. Perhaps the error is somehow not properly shown/propagated to Hue?

alex.behm · ‎08-10-2015

Hi Tom, due to other complications, I'm afraid that patch didn't make it into CDH 5.4.4, but we will include it in CDH 5.4.5 which is tentatively scheduled for the beginning of September. Thanks for your patience, and my apologies that the fix did not make it into CDH 5.4.4. Alex

alex.behm · ‎07-29-2015

Since the differences in the two systems are due to their implementation, I'd say you have the following options: 1. Use a differnet type, e.g., STRING. When concerting from STRING to TIMESTAMP you will encounter the same issues though. 2. Change your ingestion pipeline to enforce a timestamp range that is valid in both systems. This assumes that your a date with year 0000 would be considered "garbage" by your application. 3. Live with the fact that a NULL timestamp could mean it is out of range. May I ask what exactly is causing the headache? The fact that both systems return different results or the fact that for your application the year 0000 is a meaningful date?

alex.behm · ‎07-24-2015

Do you really need dates before the year 1400 or after 10000? Impala has a different supported date range than Hive due to how timestamps are handled internally (Impala uses Boost, Hive uses the Java built-ins)

alex.behm · ‎07-17-2015

Thanks for the update. I can reproduce the issue, but only when the target partition is empty. As soon as I add some data, compute incremental stats works as expected. So I'm still thinking you are hitting an edge case with an empty partition?

Online	Offline
Last Visited	‎05-10-2018 06:52 PM

Member Since	‎10-16-2013 11:04 AM
Last Visited	‎05-10-2018 06:52 PM
Posts	307
Kudos received	77

Cloudera Community

Re: External Table from Parquet folder returns emp...

Re: Impala SQL for KUDU does not work

Re: Impalad logs diskspace full

Re: Impala round function does not return expected...

Re: Is Impala a proces engine when I use kudu?

Re: why 'show column stats <table_name>` doesn't s...

Re: How to distribute impala table partitions

Re: Problems moving data between csv and avro tabl...

Re: Impala JDBC driver does not parse table aliase...

Re: Impala-Shell vs Impala in Hue

Re: Impala-Shell vs Impala in Hue

Re: ERROR FAILED: SemanticException Class not foun...

Re: How to handle out of range timestamps for impa...

Re: How to handle out of range timestamps for impa...

Re: Impala won't update stats on Hive Avro table