Reply
HJ
Explorer
Posts: 12
Registered: ‎07-05-2018
Accepted Solution

Apache kudu

[ Edited ]

I have a kudu table with more than a million records, i have been asked to do some query performance test through both impala-shell and also java. Through impala-shell i am able to perform all the queries and it gives me the timestamp through which i get to know the time taken for running the query,but whne it comes to perfrom queries using java api kudu,is it possible if i can perform join query operation using java api kudu and benchmark the logs

 

Posts: 857
Topics: 1
Kudos: 198
Solutions: 106
Registered: ‎04-22-2014

Re: Apache kudu

I moved this from the Cloudera Manager message board to the correct board, but it appears this is a duplicate of

http://community.cloudera.com/t5/Interactive-Short-cycle-SQL/Apache-kudu/m-p/69597#M4670

Posts: 857
Topics: 1
Kudos: 198
Solutions: 106
Registered: ‎04-22-2014

Re: Apache kudu

OOPS... I scanned the post earlier and made the mistake of thinking it was a duplicate!

My bad... I moved this to the right place, though, so the Kudu foks can have a look.

 

Cheers,

 

Ben

Cloudera Employee
Posts: 60
Registered: ‎04-08-2014

Re: Apache kudu

Hi HJ,

It is not possible to do a join using the native Kudu NoSQL API. You will need to use SQL with Impala or Spark SQL, or using the Spark data frame APIs to do the join.

 

Mike

HJ
Explorer
Posts: 12
Registered: ‎07-05-2018

Re: Apache kudu

I would like to run the queries and benchmark them using java, hows can i do that 

Highlighted
Cloudera Employee
Posts: 60
Registered: ‎04-08-2014

Re: Apache kudu

Kudu is only a storage engine. If you want sophisticated query processing capabilities, you have to use a query engine on top of Kudu that has an integration. Mainly that would be Impala or Spark. You can use JDBC or Spark APIs to access those systems from Java.

 

Here is how to use Impala with Kudu: https://www.cloudera.com/documentation/enterprise/5-14-x/topics/kudu_impala.html

 

Here is a blog article showing how to use Spark with Kudu in Java: https://blog.cloudera.com/blog/2017/02/up-and-running-with-apache-spark-on-apache-kudu/

 

Does that answer your question?

HJ
Explorer
Posts: 12
Registered: ‎07-05-2018

Re: Apache kudu

Thank you for the solution.
 
I am currently using impala-shella for performing various queries. complex queries using all functions and noting down the timestamp,that is how long it takes to get all the data from a table which has more than a million records.
 
I want to perfrom the sample using java kudu Api and benchmark the queries,that is see how long a query perfromed in java kudu Api.
 
I want to come up with a comparison between query perfromed in impala-shell and java side,so tell me how to perfrom queries from the java-side 
Cloudera Employee
Posts: 60
Registered: ‎04-08-2014

Re: Apache kudu

The only way I know of to do complex queries through Java is to use the Impala JDBC connector, which you can find here: https://www.cloudera.com/downloads/connectors/impala/jdbc/2-6-3.html

HJ
Explorer
Posts: 12
Registered: ‎07-05-2018

Re: Apache kudu

Sorry,if i am asking too many doubts.

 

I have a big table in oracle, is it possible if i can save the file locally in my system and insert in the kudu table.

 

Thank you for all your solution 

Cloudera Employee
Posts: 60
Registered: ‎04-08-2014

Re: Apache kudu

If you have the data in Oracle I would suggest writing it to Parquet on HDFS using Sqoop first. After that, you will be able to transfer the data to Kudu using Impala with a command like CREATE TABLE kudu_table STORED AS KUDU AS SELECT * FROM parquet_table;

Announcements