Member since
06-02-2017
8
Posts
2
Kudos Received
2
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 7883 | 04-03-2019 01:22 PM |
04-05-2019
09:59 AM
Another thing worth mentioning about Impala bulk inserts into Kudu: you may benefit from the /* +noshuffle,noclustered */ insert hint, or from setting a MEM_LIMIT, as outlined here: https://www.cloudera.com/documentation/enterprise/6/6.2/topics/impala_hints.html
... View more
04-03-2019
01:22 PM
1 Kudo
I suspect the reason that your timestamps are off by one hour is that Impala stores timestamps in Kudu as UTC (stored in Impala as a 96 bit int with nanosecond precision) converted to/from unix time (stored in Kudu as a 64 bit int with microsecond precision). So you should be able to solve that issue by treating all timestamps in your application as UTC. The discussion in this thread may be useful: https://lists.apache.org/thread.html/bb4ef37c88e76959399f40c7053a76b644217e76664982a60c703c7e@%3Cuser.impala.apache.org%3E For performance, I'm interested in more details: if you're doing something like 'insert into <kudu_table> values (...)' to insert a few rows at a time through Impala, then you'll definitely get better performance by going through the Kudu API, as going through Impala you pay extra cost for query parsing and planning, etc. Impala is more suited to doing things like 'insert in <kudu_table> select * from <some_hdfs_table>' And as Hao pointed out, there is overhead going through Impala because of the conversion from 96 bit UTC to 64 bit unix time, so you may want to make the Impala type a bigint and only convert to/from timestamps when necessary
... View more