About maurin

maurin · ‎03-06-2018

Worked ! thanks

maurin · ‎03-05-2018

Hi, Impyla connect using kerberos, we are not using ldap. I have configured the Load Balancer as stated in the docs, but still have the same error. thanks

maurin · ‎02-21-2018

Hi, I have a imapala cluster with kerberos and HA proxy, and everything works fine when I connect using impyla. But when I do a (after a kinit) impala-shell -k connect myHaproxy:21051; I get : Error: Unable to communicate with impalad service. This service may not be an impalad instance. Check host:port and try again. Traceback (most recent call last): File "/opt/cloudera/parcels/CDH-5.14.0-1.cdh5.14.0.p0.24/bin/../lib/impala-shell/impala_shell.py", line 1554, in <module> shell.cmdloop(intro) File "/usr/lib/python2.7/cmd.py", line 142, in cmdloop stop = self.onecmd(line) File "/opt/cloudera/parcels/CDH-5.14.0-1.cdh5.14.0.p0.24/bin/../lib/impala-shell/impala_shell.py", line 563, in onecmd return cmd.Cmd.onecmd(self, line) File "/usr/lib/python2.7/cmd.py", line 221, in onecmd return func(arg) File "/opt/cloudera/parcels/CDH-5.14.0-1.cdh5.14.0.p0.24/bin/../lib/impala-shell/impala_shell.py", line 717, in do_connect self._connect() File "/opt/cloudera/parcels/CDH-5.14.0-1.cdh5.14.0.p0.24/bin/../lib/impala-shell/impala_shell.py", line 764, in _connect result = self.imp_client.connect() File "/opt/cloudera/parcels/CDH-5.14.0-1.cdh5.14.0.p0.24/lib/impala-shell/lib/impala_client.py", line 245, in connect result = self.ping_impala_service() File "/opt/cloudera/parcels/CDH-5.14.0-1.cdh5.14.0.p0.24/lib/impala-shell/lib/impala_client.py", line 250, in ping_impala_service return self.imp_service.PingImpalaService() File "/opt/cloudera/parcels/CDH-5.14.0-1.cdh5.14.0.p0.24/lib/impala-shell/gen-py/ImpalaService/ImpalaService.py", line 223, in PingImpalaService return self.recv_PingImpalaService() File "/opt/cloudera/parcels/CDH-5.14.0-1.cdh5.14.0.p0.24/lib/impala-shell/gen-py/ImpalaService/ImpalaService.py", line 238, in recv_PingImpalaService raise x thrift.Thrift.TApplicationException: Invalid method name: 'PingImpalaService' any idea why? thanks

maurin · ‎04-21-2017

thanks! If you open a jira, can you send me the link? I will probably disable codegen for now. And wait until you push a fix to re enable it. thanks

maurin · ‎04-20-2017

it seems to be coming from avro. I created the table as parquet and it took 0.48sec. The table have about 900 columns, so nothing to fancy. thanks

maurin · ‎04-20-2017

It is a string of that look like "YYYY-MM-DD" the table is stored as avro. I can try using parquet or text if you want

maurin · ‎04-20-2017

Hi, using impala 2.7(8) with cdh5.10.1 here. I am trying a simple query : `select distinct(date_col_partition) from table_1` and it is taking 20 sec. But When I do a set DISABLE_CODEGEN=true; It take only less than a second. here is the profle gist: https://gist.github.com/anonymous/1a5faa3a10d4495f7b8abc3c964457db Any idea of what is going wrong? thanks

maurin · ‎11-01-2016

Hi, quick question on performance, if I have 2 tables, the first one with columns "a,b" and the second one with columns "c,d" and I create a view like the following : CREATE VIEW my_view AS ( select a,b,null,null from table_1 union select null,null,c,d from table_2) Now if I do a simple query like : select a from my_view Will the query only read from table 1 or the entire table_2 will also be scanned? (I am mainly worried about disk reads) Thanks

maurin · ‎09-26-2016

Hi, I upgraded impala to 2.6. The query aggregation improved by about 15%. I there a open ticket or an expected release date/version for the "full parallelization" ? thanks

maurin · ‎09-22-2016

Hi, I will update to 2.6 over the week end and post the results. I have 32 cores per hosts available to impala daemon. If you say that 10 million record are being process in parallel, I guess you imply that only one core is used by host (268M rows/6hosts/4 sec = ~11million). Is it expected to have only 1 core use per Node ? Did I miss something in the configuration? Or is it because of the multi-threaded aggregation improvement that you are working on ? I just want to make sure I didn't miss any obvious optimization. And just to tell you the column is of type "string". thanks

Online	Offline
Last Visited	‎04-09-2020 08:10 PM

Member Since	‎04-07-2016 03:40 PM
Last Visited	‎04-09-2020 08:10 PM
Posts	36
Kudos received	4

Cloudera Community

Re: add columns to hive/parquet table

Re: impala-shell -k with ha proxy not working

Re: impala-shell -k with ha proxy not working

impala-shell -k with ha proxy not working

Re: CodeGen way to slow

Re: CodeGen way to slow

Re: CodeGen way to slow

CodeGen way to slow

performance for views with union of different sche...

Re: AGGREGATE of query is to long

Re: AGGREGATE of query is to long