Member since
10-16-2013
307
Posts
77
Kudos Received
59
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
11136 | 04-17-2018 04:59 PM | |
6108 | 04-11-2018 10:07 PM | |
3519 | 03-02-2018 09:13 AM | |
22097 | 03-01-2018 09:22 AM | |
2615 | 02-27-2018 08:06 AM |
05-20-2015
04:49 PM
Hi Ajey, that is expected behavior. An invalid CAST will return NULL. What was your expectation? Alex
... View more
05-18-2015
09:29 AM
Hi Maarten, in general, that approach will not work, but in your specific case it might based on the contents of your data files. Let's take an example of a single HDFS file: You have an HDFS file /mytbl/year=2015/month=1/day=1/example_file Now your example_file may contain rows with many 'server' values. My understanding is that you are proposing to move that file into something like: /mytbl/year=2015/month=1/day=1/server='foo'/example_file Yes, you will be able to create a table over such a structure (with a duplicated 'server' column, one for the partition column and one for the non-partition column), but it doesn't change the fact that the partitioning is wrong if 'example_file' contains data for servers other than 'foo'. If you guarantee that all your files only contain data for a single server, then your approach may work. Alex
... View more
05-17-2015
03:23 PM
Hi Maarten, I'm afraid you are correct, you will have to create a new table and insert .. select * from oldtable. Keep in mind that partition columns are only stored in the corredponding HDFS directory structure, and not in the data files themselves. So adding or removing a partition column will require a re-write of the data (plus//minus the partition column to be added/removed). Also, the HDFS directory structure must be created for the new set of partition keys. It would certainly be more conveinent to have a single comment for this operation, but re-writing the data will be necessary due to the reasons outline above. Alex
... View more
05-15-2015
12:01 AM
1 Kudo
Hi Ajey, yes, I can see how that would be a problem. I cannot really promise any concrete release date, but since this is a behavioral regression, I'd say we should treat it with a high priority. I'd appreciate it if you could comment on the Impala JIRA repeating what you told me about the return code being the actual issue, and that Impala has regressed in that sense. Bte, as a workaround, you might try the "--ignore_query_failure" option in the shell. Understandably not ideal, but maybe there's a way to make it work in your workflow. Alex
... View more
05-14-2015
05:49 PM
1 Kudo
Hi Ajey, thanks for the update. I was able to get the WARNING in my setup, but not the ERROR cause. After some digging in our code, I've found a few interesting things that lead me to believe that this warning/error is simply incorrectly displayed. If you are curious, I believe the core issue lies in the hdfsListDirectory() API call of libhdfs. Impala assumes that when it returns NULL there was an error (because that's what the API doc says). But when reading the libhdfs code I noticed that NULL is also returned if a directory is empty, so Impala will print that warning incorrectly. We'll keep digging and file JIRAs as appropriate. I will update this thread then. For now, I think it's safe to say that this message is annoying and wrong, but not dangerous. Alex
... View more
05-14-2015
01:30 PM
Hi Ajay, we haven't seen this one, but it could be related to a recent fix: https://issues.cloudera.org//browse/IMPALA-1438 That error typically means that HDFS is running out of file descriptors, e.g., for opening a socket. Do you think it's possible you are running out of file descriptors? I wasn't able to reproduce the issue locally, so I'm probably missing some steps. Can you provide more details on what your table looks like and how it was created? Ideally a series of steps to reproduce the issue in your setup? Thanks! Alex
... View more
05-11-2015
11:08 PM
Yes, that's certainly possible. In CM, select your "Impala Daemon" configuration and look for the option: "Impala Daemon Command Line Argument Advanced Configuration Snippet (Safety Valve)" In that box, you can set command line arguments to be passed to the impalad when it starts up. For example, you would add -idle_session_timeout=10 for a 10s timeout. Remember to restart all impalads are changing any such start-up option.
... View more
03-20-2015
04:56 PM
1 Kudo
I'm afraid Impala currently does not support variable substitution. One possible workaround is described here: https://www.safaribooksonline.com/library/view/getting-started-with/9781491905760/ch04.html Search for "Variable Substitution"
... View more
02-12-2015
09:40 AM
As you're correctly pointed out Impala currently does not support the BINARY data type, so querying a tablw with that type does not work in Impala. An idea: You could try altering the column type from BINARY to STRING, and try querying the table with Impala. But beware, I hav enot tried this myself, so no guarantees it will work.
... View more