Created on 07-05-2018 09:04 AM - edited 09-16-2022 08:50 AM
hi,I am woking on kudu and oracle. I have more than 5 million records and i have been asked to read them from oracle and write into kudu table.what i did was,one way i did a ojdbc connection,got the records from oracle and insert into kudu table using partial row and insert menthod. i just want to know if i could do bulk inserts to avoid more time on writes
Created 07-05-2018 03:35 PM
Want to get a detailed solution you have to login/registered on the community
Register/LoginCreated 07-10-2018 03:00 PM
Want to get a detailed solution you have to login/registered on the community
Register/LoginCreated 07-05-2018 09:44 AM
can i do bulk insert, if so please tell me how to
Created 07-05-2018 10:18 AM
if i do the writes as per the program given in https://github.com/cloudera/kudu-examples/tree/master/java/java-sample/src/main/java/org/kududb/exam...
it takes an hour to insert the data in kudu table.
How can i insert the records in lesser time
Created 07-05-2018 03:27 PM
One option is to export to Parquet on HDFS using Sqoop, then use Impala to CREATE TABLE AS SELECT * FROM your parquet table into your Kudu table.
Unfortunately Sqoop does not have support for Kudu at this time.
Created 07-05-2018 03:35 PM
Want to get a detailed solution you have to login/registered on the community
Register/LoginCreated 07-10-2018 01:48 PM
thank you but i just want to use java and do batch insert,is tere any way to perfrom faster writes on kudu table using java
Created 07-10-2018 03:00 PM
Want to get a detailed solution you have to login/registered on the community
Register/LoginCreated 05-24-2019 01:27 PM
Hi,
I need to load the data from HIVE to KUDU table using pyspark code. i am able to insert one record using table.new_insert but could not able to load all the records at once..the way am looking is, getting the data into dataframe and load that dataframe data into KUDU table. i found example using JAVA but not with Python. will you please help.
Thx.
Created 05-30-2019 09:29 AM
Hi,
I don't know much about Kudu+PySpark except that there is a lot of room for improvement there, but maybe a couple of examples in the following patch-in-flight could be useful: https://gerrit.cloudera.org/#/c/13102/
Created 01-07-2020 12:35 AM
I am able to sqoop the data from Oracle to HDFS and then do a create table as select * from on Impala to write into Kudu.I am abe to manually run the queries here but What is the best way to automate this when i move the code to production.