About Grant Henke

Grant Henke · ‎11-13-2019

Can you try explicitly casting the string value to a timestamp? I don't think Spark will push down the timestamp predicate if it's a string. This is tracked in https://issues.apache.org/jira/browse/KUDU-2821.

Grant Henke · ‎03-29-2019

I suspect that large number of "Key already present" errors may play a part in the initial buffer size warning. The `Key already present in Kudu table 'impala::db.REPORTKUDU'. (1 of 9006 similar)` is telling you that you are trying to insert records which are not unique. The table you are selecting from must have multiple records for each `Marketing_Cloud_Visitor_ID`. I would suggest writing the create table statement to ensure the primary key is unique, or adjusting the select statment to deduplicate the records before inserting into Kudu.

Grant Henke · ‎03-29-2019

If I understand correctly, you are talking about the logs in the configured --log_dir. By default Kudu will keep 10 log files per severity level. There is a flag to change that value, but it's currently marked as "experimental". It has been in Kudu for some time, so not changing it to stable is probably a bit of an oversight. I opened an Apache Kudu jira (KUDU-2754) to change it to a stable config. In the mean time, you can use the --max_log_files configuration by unlocking experimental configurations via --unlock_experimental_flags.

Online	Offline
Last Visited	‎07-14-2020 11:08 PM

Member Since	‎10-15-2015 07:15 AM
Last Visited	‎07-14-2020 11:08 PM
Posts	4
Kudos received	2

Cloudera Community

Re: Issue of copying data from kudu to hdfs using ...

Re: Limit Kudu logs

Re: Issue of copying data from kudu to hdfs using ...

Re: WARNINGS: Error applying Kudu Op.: Incomplete:...

Re: Limit Kudu logs