The Kudu team is happy to announce the availability of Kudu 0.7.0. Kudu is currently undergoing Incubation at the Apache Software Foundation, with a beta available from Cloudera.
Kudu 0.7.0 includes improvements to client interfaces and metrics, and fixes for several key issues. We’re also releasing a refresh of the Impala Kudu parcel.
New features and improvements:
The Python client has been completely rewritten, with a focus on improving code quality and testing. The read path (scanners) has been improved by adding many of the features already supported by the C++ and Java clients. The Python client is no longer considered experimental.
KUDU-1321 Spark integration: a new API, kuduRDD,has been added, which wraps newAPIHadoopRDD and includes a default source for Spark SQL.
KUDU-1250 The Java client includes new methods countPendingErrors() and getPendingErrors() on KuduSession. These methods allow you to count and retrieve outstanding row errors when configuring sessions with AUTO_FLUSH_BACKGROUND.
New server-level metrics allow you to monitor CPU usage and context switching.
KUDU-1317 The master now attempts to spread tablets more evenly across the cluster during table creation. This has no impact on existing tables, but will improve the speed at which under-replicated tablets are re-replicated after a tablet server failure.
Notable bug fixes:
KUDU-1288 fixes a file descriptor leak, which could previously only be worked around by restarting the tablet server.
KUDU-1250 fixes a hang in the Java client when processing an in-flight batch and the previous batch encountered an error.
Notable Improvements/bug fixes in the new Impala Kudu:
Note: this release is still based on last summer’s pre-C5.5 fork of Impala and doesn’t have new features like nested types.
IMPALA-2635 fixes a common bug seen when compute stats wasn’t run and the query contained a UNION.
IMPALA-2740 fixes a bug reported on the mailing list where NULL values could be mishandled.
KUDU-1184 is actually a fix on the Impala side, new Kudu tables created via Impala will now default to the Kudu master’s default replication value, instead of always being 1.
A change was also made to leverage the new C++ Scan API introduced in KUDU-1259 which speeds up queries that handle small rows, especially row counts.
As always, your feedback is appreciated. For general Kudu questions, please visit the community page. If you have any questions related to Kudu in Cloudera context, please visit the Cloudera Community Forum.