Support Questions

Find answers, ask questions, and share your expertise
Announcements
Check out our newest addition to the community, the Cloudera Data Analytics (CDA) group hub.

PySpark - Kudu API/command reference

Explorer

Hi

 

I have been searching for sometime for a command reference/API manual for PySpark-Kudu and I have been unsuccessful so far. Does Cloudera have something that can be of help?

 

Thanks.

1 ACCEPTED SOLUTION

Contributor

Hi,

 

I'm not sure there is a full-fledged documentation on Kudu PySpark API: the connector is still in early development phase, if I'm not mistaken.

 

However, the following in-flight patch has a few examples that might be helpful:

  https://gerrit.cloudera.org/#/c/13102/2/docs/developing.adoc

 

But it doesn't answer your question about KuduContext: I'm not sure that functionality is implemented at this point.  There was a WIP patch posted some time ago: https://gerrit.cloudera.org/#/c/13086/  However, I don't know how what that status of that work at this point, unfortunately.

View solution in original post

6 REPLIES 6

Contributor

I'm going to take a poke at this and hope I'm not wasting your time...

 

This looks like a good start:

https://kudu.apache.org/docs/developing.html

 

I have done minimal Python development using PySpark and Kudu.  It's not too bad...

 

Contributor

I'm going to take a poke at this and hope I'm not wasting your time...

 

This looks like a good start:

https://kudu.apache.org/docs/developing.html

 

I have done minimal Python development using PySpark and Kudu.  It's not too bad...

 

Explorer

Could you please share how KuduContext is created in PySpark?

 

I am aware of KUDU-1603, but looking for workarounds and the weird java wrapper detailed in KUDU-1603 is not working as intended. 

Contributor

Hi,

 

I'm not sure there is a full-fledged documentation on Kudu PySpark API: the connector is still in early development phase, if I'm not mistaken.

 

However, the following in-flight patch has a few examples that might be helpful:

  https://gerrit.cloudera.org/#/c/13102/2/docs/developing.adoc

 

But it doesn't answer your question about KuduContext: I'm not sure that functionality is implemented at this point.  There was a WIP patch posted some time ago: https://gerrit.cloudera.org/#/c/13086/  However, I don't know how what that status of that work at this point, unfortunately.

Contributor

Whoops, the correct link to the WIP patch for PySpark integration work is

  http://gerrit.cloudera.org:8080/13088

Explorer

Yes - that WIP links back to KUDU-1603 that I shared earlier. Guess, we will have to wait it out. Thanks for your response. 

Take a Tour of the Community
Don't have an account?
Your experience may be limited. Sign in to explore more.