Member since
10-15-2015
4
Posts
2
Kudos Received
2
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1814 | 11-13-2019 06:17 AM | |
3467 | 03-29-2019 09:11 AM |
11-13-2019
06:17 AM
1 Kudo
Can you try explicitly casting the string value to a timestamp? I don't think Spark will push down the timestamp predicate if it's a string. This is tracked in https://issues.apache.org/jira/browse/KUDU-2821.
... View more
03-29-2019
01:42 PM
I suspect that large number of "Key already present" errors may play a part in the initial buffer size warning. The `Key already present in Kudu table 'impala::db.REPORTKUDU'. (1 of 9006 similar)` is telling you that you are trying to insert records which are not unique. The table you are selecting from must have multiple records for each `Marketing_Cloud_Visitor_ID`. I would suggest writing the create table statement to ensure the primary key is unique, or adjusting the select statment to deduplicate the records before inserting into Kudu.
... View more
03-29-2019
09:11 AM
1 Kudo
If I understand correctly, you are talking about the logs in the configured --log_dir. By default Kudu will keep 10 log files per severity level. There is a flag to change that value, but it's currently marked as "experimental". It has been in Kudu for some time, so not changing it to stable is probably a bit of an oversight. I opened an Apache Kudu jira (KUDU-2754) to change it to a stable config. In the mean time, you can use the --max_log_files configuration by unlocking experimental configurations via --unlock_experimental_flags.
... View more