About alex.behm

alex.behm · ‎05-20-2015

Hi Ajey, that is expected behavior. An invalid CAST will return NULL. What was your expectation? Alex

alex.behm · ‎05-18-2015

Hi Maarten, in general, that approach will not work, but in your specific case it might based on the contents of your data files. Let's take an example of a single HDFS file: You have an HDFS file /mytbl/year=2015/month=1/day=1/example_file Now your example_file may contain rows with many 'server' values. My understanding is that you are proposing to move that file into something like: /mytbl/year=2015/month=1/day=1/server='foo'/example_file Yes, you will be able to create a table over such a structure (with a duplicated 'server' column, one for the partition column and one for the non-partition column), but it doesn't change the fact that the partitioning is wrong if 'example_file' contains data for servers other than 'foo'. If you guarantee that all your files only contain data for a single server, then your approach may work. Alex

alex.behm · ‎05-17-2015

Hi Maarten, I'm afraid you are correct, you will have to create a new table and insert .. select * from oldtable. Keep in mind that partition columns are only stored in the corredponding HDFS directory structure, and not in the data files themselves. So adding or removing a partition column will require a re-write of the data (plus//minus the partition column to be added/removed). Also, the HDFS directory structure must be created for the new set of partition keys. It would certainly be more conveinent to have a single comment for this operation, but re-writing the data will be necessary due to the reasons outline above. Alex

alex.behm · ‎05-15-2015

Hi Ajey, yes, I can see how that would be a problem. I cannot really promise any concrete release date, but since this is a behavioral regression, I'd say we should treat it with a high priority. I'd appreciate it if you could comment on the Impala JIRA repeating what you told me about the return code being the actual issue, and that Impala has regressed in that sense. Bte, as a workaround, you might try the "--ignore_query_failure" option in the shell. Understandably not ideal, but maybe there's a way to make it work in your workflow. Alex

alex.behm · ‎05-14-2015

Hi Ajey, thanks for the update. I was able to get the WARNING in my setup, but not the ERROR cause. After some digging in our code, I've found a few interesting things that lead me to believe that this warning/error is simply incorrectly displayed. If you are curious, I believe the core issue lies in the hdfsListDirectory() API call of libhdfs. Impala assumes that when it returns NULL there was an error (because that's what the API doc says). But when reading the libhdfs code I noticed that NULL is also returned if a directory is empty, so Impala will print that warning incorrectly. We'll keep digging and file JIRAs as appropriate. I will update this thread then. For now, I think it's safe to say that this message is annoying and wrong, but not dangerous. Alex

alex.behm · ‎05-14-2015

Hi Ajay, we haven't seen this one, but it could be related to a recent fix: https://issues.cloudera.org//browse/IMPALA-1438 That error typically means that HDFS is running out of file descriptors, e.g., for opening a socket. Do you think it's possible you are running out of file descriptors? I wasn't able to reproduce the issue locally, so I'm probably missing some steps. Can you provide more details on what your table looks like and how it was created? Ideally a series of steps to reproduce the issue in your setup? Thanks! Alex

alex.behm · ‎05-11-2015

Yes, that's certainly possible. In CM, select your "Impala Daemon" configuration and look for the option: "Impala Daemon Command Line Argument Advanced Configuration Snippet (Safety Valve)" In that box, you can set command line arguments to be passed to the impalad when it starts up. For example, you would add -idle_session_timeout=10 for a 10s timeout. Remember to restart all impalads are changing any such start-up option.

alex.behm · ‎03-20-2015

I'm afraid Impala currently does not support variable substitution. One possible workaround is described here: https://www.safaribooksonline.com/library/view/getting-started-with/9781491905760/ch04.html Search for "Variable Substitution"

alex.behm · ‎02-12-2015

Thanks for following up! Glad you solved the issue,

alex.behm · ‎02-12-2015

As you're correctly pointed out Impala currently does not support the BINARY data type, so querying a tablw with that type does not work in Impala. An idea: You could try altering the column type from BINARY to STRING, and try querying the table with Impala. But beware, I hav enot tried this myself, so no guarantees it will work.

Online	Offline
Last Visited	‎05-10-2018 06:52 PM

Member Since	‎10-16-2013 11:04 AM
Last Visited	‎05-10-2018 06:52 PM
Posts	307
Kudos received	77

Cloudera Community

Re: External Table from Parquet folder returns emp...

Re: Impala SQL for KUDU does not work

Re: Impalad logs diskspace full

Re: Impala round function does not return expected...

Re: Is Impala a proces engine when I use kudu?

Re: Incorrect CAST as INTEGER

Re: Adding partition column

Re: Adding partition column

Re: Resource temporarily unavailable error message...

Re: Resource temporarily unavailable error message...

Re: Resource temporarily unavailable error message...

Re: Can the impalad "idle_query_timeout" parameter...

Re: Can --hivevar be used in beeline with impala s...

Re: OR operator not supported in Impala

Re: Does impala support 'binary' data type ?