About tmarshall

tmarshall · ‎04-05-2019

Another thing worth mentioning about Impala bulk inserts into Kudu: you may benefit from the /* +noshuffle,noclustered */ insert hint, or from setting a MEM_LIMIT, as outlined here: https://www.cloudera.com/documentation/enterprise/6/6.2/topics/impala_hints.html

tmarshall · ‎04-03-2019

I suspect the reason that your timestamps are off by one hour is that Impala stores timestamps in Kudu as UTC (stored in Impala as a 96 bit int with nanosecond precision) converted to/from unix time (stored in Kudu as a 64 bit int with microsecond precision). So you should be able to solve that issue by treating all timestamps in your application as UTC. The discussion in this thread may be useful: https://lists.apache.org/thread.html/bb4ef37c88e76959399f40c7053a76b644217e76664982a60c703c7e@%3Cuser.impala.apache.org%3E For performance, I'm interested in more details: if you're doing something like 'insert into <kudu_table> values (...)' to insert a few rows at a time through Impala, then you'll definitely get better performance by going through the Kudu API, as going through Impala you pay extra cost for query parsing and planning, etc. Impala is more suited to doing things like 'insert in <kudu_table> select * from <some_hdfs_table>' And as Hao pointed out, there is overhead going through Impala because of the conversion from 96 bit UTC to 64 bit unix time, so you may want to make the Impala type a bigint and only convert to/from timestamps when necessary

tmarshall · ‎03-14-2018

You can see the number of queued/admitted queries and mem used per pool by inspecting the logs, eg. impalad.INFO. Look for lines that come from admission-controller.cc By default each time a query is admitted we log these stats for the query's pool. If the default log level doesn't provide enough info, there's additional info that gets logged at higher levels, though of course keep in mind that more logging may slow queries down.

tmarshall · ‎03-13-2018

Its difficult to say what might be going on without more information, but a few pieces of info that may be helpful to you: - "Unspecified GSS error" generally indicates an issue with Kerberos authentication. - The bitness of the ODBC driver must match the bitness of the client application in order for everything to work correctly. From your description, it may be that the user is using a 32bit Kerberos client, in which case it is expected that the 64bit ODBC driver would not work with it. Given that the 32bit connection is fine, what's the motivation for trying to get the 64 bit driver to work?

tmarshall · ‎03-13-2018

Impala has a webui, by default run on port 25000 of each impalad node, that you may find useful. In particular, the /queries page will display a list of running or recent queries along with the resource pool they were assigned to

tmarshall · ‎06-02-2017

It looks like you may be experiencing the same issue as: https://community.cloudera.com/t5/Batch-SQL-Apache-Hive/Hive-JDBC-client-error-when-connecting-to-Kerberos-Cloudera/td-p/30829 Can you try the solution shown there?

Online	Offline
Last Visited	‎01-13-2021 07:10 PM

Member Since	‎06-02-2017 01:27 PM
Last Visited	‎01-13-2021 07:10 PM
Posts	8
Kudos received	2

Cloudera Community

Re: kudu versus impala timestamp

Re: How to monitor resources of impala queue

Re: kudu versus impala timestamp

Re: kudu versus impala timestamp

Re: How to monitor resources of impala queue

Re: Impala 32Bit vs 64bit ODBC

Re: How to monitor resources of impala queue

Re: Issue JDBC connectivity using Kerberoes - Impa...