07-10-2018 01:46 PM - last edited on 07-11-2018 05:50 AM by cjervis
I have a kudu table with more than a million records, i have been asked to do some query performance test through both impala-shell and also java. Through impala-shell i am able to perform all the queries and it gives me the timestamp through which i get to know the time taken for running the query,but whne it comes to perfrom queries using java api kudu,is it possible if i can perform join query operation using java api kudu and benchmark the logs
07-10-2018 01:55 PM
I moved this from the Cloudera Manager message board to the correct board, but it appears this is a duplicate of
07-12-2018 09:32 AM
Kudu is only a storage engine. If you want sophisticated query processing capabilities, you have to use a query engine on top of Kudu that has an integration. Mainly that would be Impala or Spark. You can use JDBC or Spark APIs to access those systems from Java.
Here is how to use Impala with Kudu: https://www.cloudera.com/documentation/enterprise/5-14-x/topics/kudu_impala.html
Here is a blog article showing how to use Spark with Kudu in Java: https://blog.cloudera.com/blog/2017/02/up-and-running-with-apache-spark-on-apache-kudu/
Does that answer your question?
07-13-2018 06:45 AM
07-13-2018 03:17 PM
The only way I know of to do complex queries through Java is to use the Impala JDBC connector, which you can find here: https://www.cloudera.com/downloads/connectors/impala/jdbc/2-6-3.html
07-16-2018 06:06 AM
Sorry,if i am asking too many doubts.
I have a big table in oracle, is it possible if i can save the file locally in my system and insert in the kudu table.
Thank you for all your solution
07-16-2018 04:12 PM
If you have the data in Oracle I would suggest writing it to Parquet on HDFS using Sqoop first. After that, you will be able to transfer the data to Kudu using Impala with a command like CREATE TABLE kudu_table STORED AS KUDU AS SELECT * FROM parquet_table;