Created on 06-05-2018 07:35 AM - edited 09-16-2022 06:18 AM
I faced an issue when using the upsert operation to update some columns of a table (I know I should use update instead).
The issue was that when I pass fewer arguments the Kudu API does not throw any exception and it does not insert/update any data.
It should throw an Exception, right?
Such behaviour is not documented in the API docs.
Created 06-06-2018 03:16 AM
Actually I've found that in order to catch such errors, the correct way is actually to get the RowError object returned by the session.apply() method.
OperationResponse op = session.apply(upst); if (op.hasRowError()){ RowError r = op.getRowError(); String str = "error status " + r.getErrorStatus() + "\nerror Operation " + r.getOperation().toString() + "\nerror TsUUID " + r.getTsUUID() + "\nerror toString " + r.toString(); LOGGER.error(str);
The other way I described in my previous post didn't catch any errors. Which is also a bad behaviour.
If this behaviour is the correct behaviour, than it should be documented.
Like they say: if it is documented is a feature, otherwise it is a bug!
Created 06-05-2018 02:03 PM
Hi RikG, what do you mean by passing fewer arguments to the Kudu API? Note that for row errors you must call KuduSession.getPendingErrors (https://kudu.apache.org/apidocs/org/apache/kudu/client/KuduSession.html#getPendingErrors--).
Created on 06-06-2018 02:02 AM - edited 06-06-2018 02:44 AM
No, I mean passing less values for the column than it actually has.
I tested again and for the arguments (column values) that I don't pass, it ignores them and just updates the existing ones.
But if the table has columns with NOT NULL restriction, then it does not throw any exception and it should.
My table description:
CREATE TABLE badjoras ( house_id STRING, tip_id STRING, created_ts BIGINT, status STRING NOT NULL, status_ts BIGINT, visible BOOLEAN NOT NULL, visible_ts BIGINT, PRIMARY KEY (house_id, tip_id) ) PARTITION BY HASH(house_id) PARTITIONS 3 STORED AS KUDU;
What I'm doing:
try { if (kc.tableExists(TABLE_BADJORAS)) { KuduSession session = kc.newSession(); KuduTable table = kc.openTable(TABLE_BADJORAS); Upsert upst = table.newUpsert(); PartialRow row = upst.getRow(); row.addString("house_id", "0123456789"); row.addString("tip_id", "987654321"); session.apply(upst); System.out.println("overflowed " + session.getPendingErrors().isOverflowed()); System.out.println("size= " + session.getPendingErrors().getRowErrors().length); for (RowError r : session.getPendingErrors().getRowErrors()) { System.out.println("error status " + r.getErrorStatus()); System.out.println("error Operation " + r.getOperation().toString()); System.out.println("error TsUUID " + r.getTsUUID()); System.out.println("error toString " + r.toString()); } LOGGER.info("KUDU BADJORAS" + houseId + "+" + tId + " upserted"); session.close(); } } catch (KuduException e) { e.printStackTrace(); } finally { LOGGER.info("KUDU BADJORAS" + houseId + " run time = " + (System.currentTimeMillis() - start)); }
Using my sample code, when a row is upserted in a table with columns with NOT NULL restrictions it never throws the KuduException, thus it doesn't enter the catch clause.
I've even tried to catch some errors like you recommended, but I got no errors when using such an upsert operation.
Created 06-06-2018 03:16 AM
Actually I've found that in order to catch such errors, the correct way is actually to get the RowError object returned by the session.apply() method.
OperationResponse op = session.apply(upst); if (op.hasRowError()){ RowError r = op.getRowError(); String str = "error status " + r.getErrorStatus() + "\nerror Operation " + r.getOperation().toString() + "\nerror TsUUID " + r.getTsUUID() + "\nerror toString " + r.toString(); LOGGER.error(str);
The other way I described in my previous post didn't catch any errors. Which is also a bad behaviour.
If this behaviour is the correct behaviour, than it should be documented.
Like they say: if it is documented is a feature, otherwise it is a bug!
Created 06-08-2018 01:01 PM
Hey @RikG, you are correct. The way errors are communicated is dependent on the configured FlushMode. In your example the FlushMode is not set, so it's the default of AUTO_FLUSH_SYNC. In AUTO_FLUSH_SYNC mode any per-row errors are immediately returned as part of the OperationResponse, since the write happens synchronously. In AUTO_FLUSH_BACKGROUND mode it's necessary to call getPendingErrors. The docs on AUTO_FLUSH_BACKGROUND cover this to some extent.