<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: Hive little query slow in Support Questions</title>
    <link>https://community.cloudera.com/t5/Support-Questions/Hive-little-query-slow/m-p/408707#M252769</link>
    <description>&lt;P&gt;Too simple&amp;nbsp;&lt;span class="lia-unicode-emoji" title=":face_savoring_food:"&gt;😋&lt;/span&gt;&lt;BR /&gt;I can't use clause IN as already written in my first post.&lt;BR /&gt;Then my credential are "read only," I can't open a temporary tables.&lt;/P&gt;</description>
    <pubDate>Mon, 26 May 2025 13:35:43 GMT</pubDate>
    <dc:creator>Dariuz82</dc:creator>
    <dc:date>2025-05-26T13:35:43Z</dc:date>
    <item>
      <title>Hive little query slow</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Hive-little-query-slow/m-p/408458#M252713</link>
      <description>&lt;P&gt;Hi all,&lt;/P&gt;&lt;P&gt;I'm working with Cloudera Hive for the first time, seen that my company chose to use this database.&lt;BR /&gt;I received the JDBC connectionString and I created a little method in java.&lt;BR /&gt;Here the script&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;&lt;LI-CODE lang="java"&gt;public String[] getEmployee(){	
	String[] idcod ={"ABC123", "FLS163","XYZ001","PLE456", "ERV021", and so on......};
	String[] names= new String[idcod.length];
	int ncount=0;
	long start1, elapsedTimequery;
	try{
		Statement  s = conn.createStatement();
		System.out.println("Fetch size: "+s.getFetchSize());
		ResultSet  r=null;
		for(int i=0; i&amp;lt;idcod.length; i++) {
			String query="";
			query+=" SELECT code, surname, name FROM mytable where code='"+idcod[i]+"'";
		
			ncount=0;	
			start1 = System.nanoTime(); 
			r = s.executeQuery(query);
			elapsedTimequery = System.nanoTime() - start1;
			System.out.println((double)(elapsedTimequery/1000/1000)/1000+" seconds");
			while(r.next()){
				names[i]=r.getString("surname") + " " +r.getString("name");
				ncount++;
			}
	  	
			System.out.println(ncount+"--"+idcod[i]);
				
				
		}	
		r.close();
		s.close();
    	}
    	catch(SQLException sqlex){
    		sqlex.printStackTrace();
    	}

    	catch(Exception ex){
    		ex.printStackTrace();
    	}
    	
 	return names;
 }&lt;/LI-CODE&gt;&lt;P&gt;For each query&amp;nbsp; r = s.executeQuery(query) it takes more of 4 seconds to search data and got it, meanwhile with Oracle a query takes about 2 milliseconds with Fetch "10".&lt;BR /&gt;&lt;BR /&gt;With technical referent tried ICEBERG Table but it's no change.&lt;BR /&gt;I tried to add "&lt;SPAN&gt;hive.fetch.task.conversion" to my connectionString with all possible values (none, minimal, more) with no change.&lt;BR /&gt;I can't use clause IN because my main program manage the codes row by row.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;How can I avoid all this time for an hive query and to go near to Oracle timing?&lt;/P&gt;&lt;P&gt;Thank you to all.&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;</description>
      <pubDate>Sat, 17 May 2025 11:32:57 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Hive-little-query-slow/m-p/408458#M252713</guid>
      <dc:creator>Dariuz82</dc:creator>
      <dc:date>2025-05-17T11:32:57Z</dc:date>
    </item>
    <item>
      <title>Re: Hive little query slow</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Hive-little-query-slow/m-p/408470#M252719</link>
      <description>&lt;P&gt;Hive is not a low-latency OLTP Database like Oracle.&amp;nbsp;&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;Hive is designed for batch processing, not fast single-row lookups.&lt;/LI&gt;&lt;LI&gt;Every Select you run triggers a full query execution plan.&amp;nbsp;&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;From the code snippet observed , queries executing row by row. (i.e.) executeQuery() multiple times , it looks expensive.&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;hive.fetch.task.conversion&amp;nbsp;&lt;/STRONG&gt;won't help here, since it will be useful for optimizing simple SELECT's into client-side fetches, but Hive still builds a full plan behind the scenes .&amp;nbsp;&lt;BR /&gt;&lt;BR /&gt;Better approach would be , Refactor the loop into a single IN clause.&amp;nbsp;&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;SELECT code, surname, name FROM mytable WHERE code IN ('ABC123', 'FLS163', 'XYZ001', ...)&lt;/LI-CODE&gt;&lt;P&gt;Then store the results in a map.&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;Map&amp;lt;String, String&amp;gt; codeToName = new HashMap&amp;lt;&amp;gt;();
while (r.next()) {
    codeToName.put(r.getString("code"), r.getString("surname") + " " + r.getString("name"));
}&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;Even if you must process row-by-row , fetching all data in a batch drastically reduces query overhead.&amp;nbsp;&lt;/P&gt;&lt;P&gt;If the list is too large for IN clause, insert those values in temp Hive table.&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;// Insert your id list into a temp table
CREATE TEMPORARY TABLE tmp_ids (code STRING);
-- Then insert all your codes into tmp_ids

SELECT a.code, a.surname, a.name
FROM mytable a
JOIN tmp_ids b ON a.code = b.code;&lt;/LI-CODE&gt;&lt;P&gt;Hive optimize the join rather than executing multiple separate queries.&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 19 May 2025 10:02:29 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Hive-little-query-slow/m-p/408470#M252719</guid>
      <dc:creator>ggangadharan</dc:creator>
      <dc:date>2025-05-19T10:02:29Z</dc:date>
    </item>
    <item>
      <title>Re: Hive little query slow</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Hive-little-query-slow/m-p/408471#M252720</link>
      <description>&lt;P&gt;Would like to mention few more recommendations ,&amp;nbsp;&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;Hive will be fast with columnar storage and predicate pushdown. Store the table as ORC (with Snappy/Zlib) if possible .&lt;BR /&gt;Ref -&amp;nbsp;&lt;A href="https://docs.cloudera.com/runtime/7.2.0/hive-performance-tuning/topics/hive_prepare_to_tune_performance.html#:~:text=,vectorized%20by%20examining%20explain%20plans" target="_blank"&gt;https://docs.cloudera.com/runtime/7.2.0/hive-performance-tuning/topics/hive_prepare_to_tune_performance.html#:~:text=,vectorized%20by%20examining%20explain%20plans&lt;/A&gt;&lt;/LI&gt;&lt;LI&gt;Collect statistics and enable predicate push-down (hive.optimize.ppd=true, default in Hive recent versions) so that filtering on code skips irrelevant data.&lt;/LI&gt;&lt;LI&gt;If code column has limited distinct values, consider partitioning or bucketing on it: a partitioned ORC table will read only the needed partition. Also keep vectorization enabled (hive.vectorized.execution.enabled=true), which processes rows in batches – a big speedup for scans.&lt;/LI&gt;&lt;/UL&gt;</description>
      <pubDate>Mon, 19 May 2025 10:10:52 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Hive-little-query-slow/m-p/408471#M252720</guid>
      <dc:creator>ggangadharan</dc:creator>
      <dc:date>2025-05-19T10:10:52Z</dc:date>
    </item>
    <item>
      <title>Re: Hive little query slow</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Hive-little-query-slow/m-p/408707#M252769</link>
      <description>&lt;P&gt;Too simple&amp;nbsp;&lt;span class="lia-unicode-emoji" title=":face_savoring_food:"&gt;😋&lt;/span&gt;&lt;BR /&gt;I can't use clause IN as already written in my first post.&lt;BR /&gt;Then my credential are "read only," I can't open a temporary tables.&lt;/P&gt;</description>
      <pubDate>Mon, 26 May 2025 13:35:43 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Hive-little-query-slow/m-p/408707#M252769</guid>
      <dc:creator>Dariuz82</dc:creator>
      <dc:date>2025-05-26T13:35:43Z</dc:date>
    </item>
  </channel>
</rss>

