Created on 05-17-2025 03:33 AM - edited 05-17-2025 04:32 AM
Hi all,
I'm working with Cloudera Hive for the first time, seen that my company chose to use this database.
I received the JDBC connectionString and I created a little method in java.
Here the script
public String[] getEmployee(){
String[] idcod ={"ABC123", "FLS163","XYZ001","PLE456", "ERV021", and so on......};
String[] names= new String[idcod.length];
int ncount=0;
long start1, elapsedTimequery;
try{
Statement s = conn.createStatement();
System.out.println("Fetch size: "+s.getFetchSize());
ResultSet r=null;
for(int i=0; i<idcod.length; i++) {
String query="";
query+=" SELECT code, surname, name FROM mytable where code='"+idcod[i]+"'";
ncount=0;
start1 = System.nanoTime();
r = s.executeQuery(query);
elapsedTimequery = System.nanoTime() - start1;
System.out.println((double)(elapsedTimequery/1000/1000)/1000+" seconds");
while(r.next()){
names[i]=r.getString("surname") + " " +r.getString("name");
ncount++;
}
System.out.println(ncount+"--"+idcod[i]);
}
r.close();
s.close();
}
catch(SQLException sqlex){
sqlex.printStackTrace();
}
catch(Exception ex){
ex.printStackTrace();
}
return names;
}
For each query r = s.executeQuery(query) it takes more of 4 seconds to search data and got it, meanwhile with Oracle a query takes about 2 milliseconds with Fetch "10".
With technical referent tried ICEBERG Table but it's no change.
I tried to add "hive.fetch.task.conversion" to my connectionString with all possible values (none, minimal, more) with no change.
I can't use clause IN because my main program manage the codes row by row.
How can I avoid all this time for an hive query and to go near to Oracle timing?
Thank you to all.
Created 05-19-2025 03:02 AM
Hive is not a low-latency OLTP Database like Oracle.
From the code snippet observed , queries executing row by row. (i.e.) executeQuery() multiple times , it looks expensive.
hive.fetch.task.conversion won't help here, since it will be useful for optimizing simple SELECT's into client-side fetches, but Hive still builds a full plan behind the scenes .
Better approach would be , Refactor the loop into a single IN clause.
SELECT code, surname, name FROM mytable WHERE code IN ('ABC123', 'FLS163', 'XYZ001', ...)
Then store the results in a map.
Map<String, String> codeToName = new HashMap<>();
while (r.next()) {
codeToName.put(r.getString("code"), r.getString("surname") + " " + r.getString("name"));
}
Even if you must process row-by-row , fetching all data in a batch drastically reduces query overhead.
If the list is too large for IN clause, insert those values in temp Hive table.
// Insert your id list into a temp table
CREATE TEMPORARY TABLE tmp_ids (code STRING);
-- Then insert all your codes into tmp_ids
SELECT a.code, a.surname, a.name
FROM mytable a
JOIN tmp_ids b ON a.code = b.code;
Hive optimize the join rather than executing multiple separate queries.
Created 05-19-2025 03:02 AM
Hive is not a low-latency OLTP Database like Oracle.
From the code snippet observed , queries executing row by row. (i.e.) executeQuery() multiple times , it looks expensive.
hive.fetch.task.conversion won't help here, since it will be useful for optimizing simple SELECT's into client-side fetches, but Hive still builds a full plan behind the scenes .
Better approach would be , Refactor the loop into a single IN clause.
SELECT code, surname, name FROM mytable WHERE code IN ('ABC123', 'FLS163', 'XYZ001', ...)
Then store the results in a map.
Map<String, String> codeToName = new HashMap<>();
while (r.next()) {
codeToName.put(r.getString("code"), r.getString("surname") + " " + r.getString("name"));
}
Even if you must process row-by-row , fetching all data in a batch drastically reduces query overhead.
If the list is too large for IN clause, insert those values in temp Hive table.
// Insert your id list into a temp table
CREATE TEMPORARY TABLE tmp_ids (code STRING);
-- Then insert all your codes into tmp_ids
SELECT a.code, a.surname, a.name
FROM mytable a
JOIN tmp_ids b ON a.code = b.code;
Hive optimize the join rather than executing multiple separate queries.
Created 05-26-2025 06:35 AM
Too simple 😋
I can't use clause IN as already written in my first post.
Then my credential are "read only," I can't open a temporary tables.
Created 05-19-2025 03:10 AM
Would like to mention few more recommendations ,