Member since
04-07-2016
36
Posts
4
Kudos Received
1
Solution
My Accepted Solutions
Title | Views | Posted |
---|---|---|
20644 | 08-01-2016 11:30 AM |
03-05-2018
04:04 PM
Hi, Impyla connect using kerberos, we are not using ldap. I have configured the Load Balancer as stated in the docs, but still have the same error. thanks
... View more
02-21-2018
03:48 PM
Hi, I have a imapala cluster with kerberos and HA proxy, and everything works fine when I connect using impyla. But when I do a (after a kinit) impala-shell -k connect myHaproxy:21051; I get : Error: Unable to communicate with impalad service. This service may not be an impalad instance. Check host:port and try again.
Traceback (most recent call last):
File "/opt/cloudera/parcels/CDH-5.14.0-1.cdh5.14.0.p0.24/bin/../lib/impala-shell/impala_shell.py", line 1554, in <module>
shell.cmdloop(intro)
File "/usr/lib/python2.7/cmd.py", line 142, in cmdloop
stop = self.onecmd(line)
File "/opt/cloudera/parcels/CDH-5.14.0-1.cdh5.14.0.p0.24/bin/../lib/impala-shell/impala_shell.py", line 563, in onecmd
return cmd.Cmd.onecmd(self, line)
File "/usr/lib/python2.7/cmd.py", line 221, in onecmd
return func(arg)
File "/opt/cloudera/parcels/CDH-5.14.0-1.cdh5.14.0.p0.24/bin/../lib/impala-shell/impala_shell.py", line 717, in do_connect
self._connect()
File "/opt/cloudera/parcels/CDH-5.14.0-1.cdh5.14.0.p0.24/bin/../lib/impala-shell/impala_shell.py", line 764, in _connect
result = self.imp_client.connect()
File "/opt/cloudera/parcels/CDH-5.14.0-1.cdh5.14.0.p0.24/lib/impala-shell/lib/impala_client.py", line 245, in connect
result = self.ping_impala_service()
File "/opt/cloudera/parcels/CDH-5.14.0-1.cdh5.14.0.p0.24/lib/impala-shell/lib/impala_client.py", line 250, in ping_impala_service
return self.imp_service.PingImpalaService()
File "/opt/cloudera/parcels/CDH-5.14.0-1.cdh5.14.0.p0.24/lib/impala-shell/gen-py/ImpalaService/ImpalaService.py", line 223, in PingImpalaService
return self.recv_PingImpalaService()
File "/opt/cloudera/parcels/CDH-5.14.0-1.cdh5.14.0.p0.24/lib/impala-shell/gen-py/ImpalaService/ImpalaService.py", line 238, in recv_PingImpalaService
raise x
thrift.Thrift.TApplicationException: Invalid method name: 'PingImpalaService' any idea why? thanks
... View more
Labels:
- Labels:
-
Apache Impala
-
Kerberos
04-21-2017
01:32 AM
thanks! If you open a jira, can you send me the link? I will probably disable codegen for now. And wait until you push a fix to re enable it. thanks
... View more
04-20-2017
11:10 PM
it seems to be coming from avro. I created the table as parquet and it took 0.48sec. The table have about 900 columns, so nothing to fancy. thanks
... View more
04-20-2017
09:56 PM
It is a string of that look like "YYYY-MM-DD" the table is stored as avro. I can try using parquet or text if you want
... View more
04-20-2017
09:08 PM
Hi, using impala 2.7(8) with cdh5.10.1 here. I am trying a simple query : `select distinct(date_col_partition) from table_1` and it is taking 20 sec. But When I do a set DISABLE_CODEGEN=true; It take only less than a second. here is the profle gist: https://gist.github.com/anonymous/1a5faa3a10d4495f7b8abc3c964457db Any idea of what is going wrong? thanks
... View more
Labels:
- Labels:
-
Apache Impala
11-01-2016
09:31 PM
Hi, quick question on performance, if I have 2 tables, the first one with columns "a,b" and the second one with columns "c,d" and I create a view like the following : CREATE VIEW my_view AS (
select a,b,null,null from table_1
union
select null,null,c,d from table_2) Now if I do a simple query like : select a from my_view Will the query only read from table 1 or the entire table_2 will also be scanned? (I am mainly worried about disk reads) Thanks
... View more
09-26-2016
12:52 PM
Hi, I upgraded impala to 2.6. The query aggregation improved by about 15%. I there a open ticket or an expected release date/version for the "full parallelization" ? thanks
... View more
09-22-2016
07:23 PM
Hi, I will update to 2.6 over the week end and post the results. I have 32 cores per hosts available to impala daemon. If you say that 10 million record are being process in parallel, I guess you imply that only one core is used by host (268M rows/6hosts/4 sec = ~11million). Is it expected to have only 1 core use per Node ? Did I miss something in the configuration? Or is it because of the multi-threaded aggregation improvement that you are working on ? I just want to make sure I didn't miss any obvious optimization. And just to tell you the column is of type "string". thanks
... View more