Member since
01-24-2023
2
Posts
0
Kudos Received
0
Solutions
05-07-2023
01:08 PM
the size of the Dataframe was huge and when partitioning with 'date' column, there were partitions with very large data and others not. the partitions with very large data trying to write data in kudu which somehow gives this error try to rebalance the partitions to have the same size of records or split into small ones to be written multiple times in kudu
... View more
01-24-2023
03:58 AM
I have an exception thrown when trying to write a data frame to kudu of size 524GB. After calling this writing part: df.write.format("org.apache.kudu.spark.kudu").option(
"kudu.master",
"master1:port,master2:port,master3:port",
).option("kudu.table", f"impala::{schema}.{table}").mode("append").save() this exception is thrown: java.lang.RuntimeException: PendingErrors overflowed. Failed to write at least 1000 rows to Kudu; Sample errors: Timed out: cannot complete before timeout: Batch{operations=1000, tablet="0bc1e2a497ab4306b6861f81dc678d9f" [0x00000002, 0x00000003), ignoredErrors=[], rpc=KuduRpc(method=Write, tablet=0bc1e2a497ab4306b6861f81dc678d9f, attempt=26, TimeoutTracker(timeout=30000, elapsed=29585), Trace Summary(29585 ms): Sent(26), Received(26), Delayed(26), MasterRefresh(0), AuthRefresh(0), Truncated: false here is the yarn.log for this spark job that throws the error Yarn log for a spark job Here is the error that is thrown when inserting data to kudu. I really appreciate any help you can provide. Thanks in advance!
... View more
Labels: