Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Inconsistency in documentation/request for clarification of version compatibility

Highlighted

Inconsistency in documentation/request for clarification of version compatibility

New Contributor

Hi,

 

I was going through some of the kudu spark documentation for CDH 6.1.x located here:

https://www.cloudera.com/documentation/enterprise/6/6.1/topics/kudu_development.html

now, I would like to use the Upsert ignoreNull Option that is described there. This page also describes mentions that kudu-spark is available up to version 1.7, but as far as I can tell this option is only available starting from kudu spark version 1.8. Compare the kudu documentation on the apache website for 1.7 versus 1.8:

https://kudu.apache.org/releases/1.8.0/docs/developing.html

https://kudu.apache.org/releases/1.7.1/docs/developing.html

 

I have also tried using both kudu-spark 1.7 and 1.8, and as expected the upsert ignoreNulls Option is only available from 1.8 onwards.

 

This leads me to my question: for CDH 6.1.x, what versions of kudu-spark are supported? Up to 1.7, or 1.8?

In either case, an update to the official cloudera documentation might be in order, to more consistently reflect the available functionality and/or supported versions.

2 REPLIES 2
Highlighted

Re: Inconsistency in documentation/request for clarification of version compatibility

Contributor

Hi,

 

Thank you for reporting the issue!

 

With CDH6.1.0, kudu-spark2_2.11-1.8.0-cdh6.1.0.jar is available:

 https://archive.cloudera.com/cdh6/6.1.0/maven-repository/org/apache/kudu/kudu-spark2_2.11/1.8.0-cdh6...

 

However, applications can use kudu-spark2_2.11-1.7.0 with Kudu server side of CDH6.1.0 (i.e. the older version of kudu_spark2_11 is 'supported' at least in this sense).

 

Yes, you are right: in the Apache Kudu git repo, the UPSERT ignoreNull option is available Kudu 1.8.0 and onward.  For CDH, the UPSERT ignoreNull option is available starting kudu-spark2_2.11-1.8.0, it's not available in older versions (i.e. kudu-spark2_2.11-1.7.0 doesn't have it).

 

I'll try to reach out to see whether the inconsistency you pointed can be fixed in CDH6.1.0 online documentation.

 

 

Thanks,

 

Alexey

Re: Inconsistency in documentation/request for clarification of version compatibility

New Contributor

Thank you for the clarification, Alexey. Much appreciated

Edit: come to think of it: do you know anything about the relative efficiencies of upsert with ignoreNulls versus retrieving a dataframe from the table, doing my modifications in memory, and then upserting? Does kudu/spark do something similar under the hood, so there is little expected performance gain, or is it really a less "expensive" operation to do an update with ignoreNulls?

Don't have an account?
Coming from Hortonworks? Activate your account here