Created on 08-15-2019 10:03 AM - last edited on 08-15-2019 12:43 PM by cjervis
Hi
I have been searching for sometime for a command reference/API manual for PySpark-Kudu and I have been unsuccessful so far. Does Cloudera have something that can be of help?
Thanks.
Created 08-22-2019 01:18 PM
Hi,
I'm not sure there is a full-fledged documentation on Kudu PySpark API: the connector is still in early development phase, if I'm not mistaken.
However, the following in-flight patch has a few examples that might be helpful:
https://gerrit.cloudera.org/#/c/13102/2/docs/developing.adoc
But it doesn't answer your question about KuduContext: I'm not sure that functionality is implemented at this point. There was a WIP patch posted some time ago: https://gerrit.cloudera.org/#/c/13086/ However, I don't know how what that status of that work at this point, unfortunately.
Created 08-15-2019 12:11 PM
I'm going to take a poke at this and hope I'm not wasting your time...
This looks like a good start:
https://kudu.apache.org/docs/developing.html
I have done minimal Python development using PySpark and Kudu. It's not too bad...
Created 08-15-2019 12:13 PM
I'm going to take a poke at this and hope I'm not wasting your time...
This looks like a good start:
https://kudu.apache.org/docs/developing.html
I have done minimal Python development using PySpark and Kudu. It's not too bad...
Created 08-15-2019 02:12 PM
Could you please share how KuduContext is created in PySpark?
I am aware of KUDU-1603, but looking for workarounds and the weird java wrapper detailed in KUDU-1603 is not working as intended.
Created 08-22-2019 01:18 PM
Hi,
I'm not sure there is a full-fledged documentation on Kudu PySpark API: the connector is still in early development phase, if I'm not mistaken.
However, the following in-flight patch has a few examples that might be helpful:
https://gerrit.cloudera.org/#/c/13102/2/docs/developing.adoc
But it doesn't answer your question about KuduContext: I'm not sure that functionality is implemented at this point. There was a WIP patch posted some time ago: https://gerrit.cloudera.org/#/c/13086/ However, I don't know how what that status of that work at this point, unfortunately.
Created 08-22-2019 01:20 PM
Whoops, the correct link to the WIP patch for PySpark integration work is
Created 08-22-2019 01:50 PM
Yes - that WIP links back to KUDU-1603 that I shared earlier. Guess, we will have to wait it out. Thanks for your response.