<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: Nifi &amp; Hive are not running map jobs in parallel. How can I better utilize my Big Data?, in Archives of Support Questions (Read Only)</title>
    <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Nifi-Hive-are-not-running-map-jobs-in-parallel-How-can-I/m-p/139212#M43949</link>
    <description>&lt;P&gt;Hi, I found this about insert operation and parallelism:&lt;/P&gt;&lt;P&gt;Note: The INSERT ... VALUES technique is not suitable for loading large quantities of data into HDFS-based tables, because &lt;STRONG&gt;the insert operations cannot be parallelized&lt;/STRONG&gt;, and each one produces a separate data file. Use it for setting up small dimension tables or tiny amounts of data for experimenting with SQL syntax, or with HBase tables. Do not use it for large ETL jobs or benchmark tests for load operations. Do not run scripts with thousands of INSERT ... VALUES statements that insert a single row each time. If you do run INSERT ... VALUES operations to load data into a staging table as one stage in an ETL pipeline, include multiple row values if possible within each VALUES clause, and use a separate database to make cleanup easier if the operation does produce many tiny files.&lt;/P&gt;&lt;P&gt;&lt;A href="http://www.cloudera.com/documentation/enterprise/5-5-x/topics/impala_insert.html"&gt;http://www.cloudera.com/documentation/enterprise/5-5-x/topics/impala_insert.html&lt;/A&gt;&lt;/P&gt;</description>
    <pubDate>Wed, 26 Oct 2016 23:02:19 GMT</pubDate>
    <dc:creator>liranye</dc:creator>
    <dc:date>2016-10-26T23:02:19Z</dc:date>
    <item>
      <title>Nifi &amp; Hive are not running map jobs in parallel. How can I better utilize my Big Data?,</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Nifi-Hive-are-not-running-map-jobs-in-parallel-How-can-I/m-p/139208#M43945</link>
      <description>&lt;P&gt;Hi all,&lt;/P&gt;&lt;P&gt;I'm in the middle of a massive ingest process in NiFi using putHiveQL which is pretty much choking.  &lt;/P&gt;&lt;P&gt;Using: Nifi 1 (Beta - untill the queues get empty) and CDH 5.4.3 (Hive 1.1)&lt;/P&gt;&lt;P&gt;I have made everything I could think of to enable parallel processing. But still I can see only one or two jobs running in parallel.&lt;/P&gt;&lt;P&gt;Do you think I'm missing somthing?&lt;/P&gt;&lt;P&gt;1. hive configuration  - enabling hive.exec.parallel and increasing hive.exec.parallel.thread.number&lt;/P&gt;&lt;P&gt;&amp;lt;property&amp;gt;  &amp;lt;name&amp;gt;hive.exec.parallel
&amp;lt;/name&amp;gt;
&amp;lt;value&amp;gt;true&amp;lt;/value&amp;gt;
&amp;lt;description&amp;gt;Whether to execute jobs in parallel
&amp;lt;/description&amp;gt;
&amp;lt;/property&amp;gt;
&amp;lt;property&amp;gt;
&amp;lt;name&amp;gt;hive.exec.parallel.thread.number&amp;lt;/name&amp;gt;
&amp;lt;value&amp;gt;35&amp;lt;/value&amp;gt;
&amp;lt;description&amp;gt;Whether to execute jobs in parallel&amp;lt;/description&amp;gt;
&amp;lt;/property&amp;gt;&lt;/P&gt;&lt;P&gt;2. Configured Connect2HiveAndExec-Concurrent tasks to 10&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="8684-nifi-connect2hiveandexec-concurrent-tasks.png" style="width: 834px;"&gt;&lt;img src="https://community.cloudera.com/t5/image/serverpage/image-id/21862iEC8A8EABB0E5C8B7/image-size/medium?v=v2&amp;amp;px=400" role="button" title="8684-nifi-connect2hiveandexec-concurrent-tasks.png" alt="8684-nifi-connect2hiveandexec-concurrent-tasks.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;3.Increased NiFi settings Max threads&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="8685-nifi-settings-threads.png" style="width: 411px;"&gt;&lt;img src="https://community.cloudera.com/t5/image/serverpage/image-id/21863i49C129D91BC2079A/image-size/medium?v=v2&amp;amp;px=400" role="button" title="8685-nifi-settings-threads.png" alt="8685-nifi-settings-threads.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="8683-hue-job-browser.png" style="width: 1920px;"&gt;&lt;img src="https://community.cloudera.com/t5/image/serverpage/image-id/21864iC9725D699455B4A7/image-size/medium?v=v2&amp;amp;px=400" role="button" title="8683-hue-job-browser.png" alt="8683-hue-job-browser.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;,&lt;/P&gt;</description>
      <pubDate>Tue, 21 Apr 2026 13:48:44 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Nifi-Hive-are-not-running-map-jobs-in-parallel-How-can-I/m-p/139208#M43945</guid>
      <dc:creator>liranye</dc:creator>
      <dc:date>2026-04-21T13:48:44Z</dc:date>
    </item>
    <item>
      <title>Re: Nifi &amp; Hive are not running map jobs in parallel. How can I better utilize my Big Data?,</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Nifi-Hive-are-not-running-map-jobs-in-parallel-How-can-I/m-p/139209#M43946</link>
      <description>&lt;P&gt;PutHiveQL is not really intended to be used for "massive ingest" purposes since it has to go through the Hive JDBC driver which has a lot of overhead for a single insert. PutHiveStreaming would probably be what you want to use, or just writing data to a directory in HDFS (using PutHDFS) and creating a Hive external table on top of it.&lt;/P&gt;</description>
      <pubDate>Wed, 19 Oct 2016 19:57:04 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Nifi-Hive-are-not-running-map-jobs-in-parallel-How-can-I/m-p/139209#M43946</guid>
      <dc:creator>bbende</dc:creator>
      <dc:date>2016-10-19T19:57:04Z</dc:date>
    </item>
    <item>
      <title>Re: Nifi &amp; Hive are not running map jobs in parallel. How can I better utilize my Big Data?,</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Nifi-Hive-are-not-running-map-jobs-in-parallel-How-can-I/m-p/139210#M43947</link>
      <description>&lt;P&gt;Thanks for the fastest response ever &lt;span class="lia-unicode-emoji" title=":slightly_smiling_face:"&gt;🙂&lt;/span&gt;&lt;/P&gt;&lt;P&gt;You are generally right. But this flow should stablized soon as it complete loading history files and handle only two files per minute. The current "backlog" is about 2500 insert commands waiting in queue. Maybe I exaggerated using "massive ingest" to describe the problem...&lt;/P&gt;&lt;P&gt;Is there another way to temporarily boost this process?&lt;/P&gt;</description>
      <pubDate>Wed, 19 Oct 2016 20:13:55 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Nifi-Hive-are-not-running-map-jobs-in-parallel-How-can-I/m-p/139210#M43947</guid>
      <dc:creator>liranye</dc:creator>
      <dc:date>2016-10-19T20:13:55Z</dc:date>
    </item>
    <item>
      <title>Re: Nifi &amp; Hive are not running map jobs in parallel. How can I better utilize my Big Data?,</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Nifi-Hive-are-not-running-map-jobs-in-parallel-How-can-I/m-p/139211#M43948</link>
      <description>&lt;P&gt;I suspect it is something more on the Hive side of things, which is out of my domain. &lt;/P&gt;&lt;P&gt;Increasing the concurrent tasks on the PutHiveQL processor is the appropriate approach on the NiFi side, generally somewhere between 1-5 concurrent tasks is usually enough, but the concurrent tasks can only work as fast as whatever they are calling. If all 10 of your threads go to make a call to the Hive JDBC driver, and 2 of them are doing stuff, and 8 are blocking because of something in Hive, then there isn't much NiFi can do.&lt;/P&gt;</description>
      <pubDate>Wed, 19 Oct 2016 20:24:24 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Nifi-Hive-are-not-running-map-jobs-in-parallel-How-can-I/m-p/139211#M43948</guid>
      <dc:creator>bbende</dc:creator>
      <dc:date>2016-10-19T20:24:24Z</dc:date>
    </item>
    <item>
      <title>Re: Nifi &amp; Hive are not running map jobs in parallel. How can I better utilize my Big Data?,</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Nifi-Hive-are-not-running-map-jobs-in-parallel-How-can-I/m-p/139212#M43949</link>
      <description>&lt;P&gt;Hi, I found this about insert operation and parallelism:&lt;/P&gt;&lt;P&gt;Note: The INSERT ... VALUES technique is not suitable for loading large quantities of data into HDFS-based tables, because &lt;STRONG&gt;the insert operations cannot be parallelized&lt;/STRONG&gt;, and each one produces a separate data file. Use it for setting up small dimension tables or tiny amounts of data for experimenting with SQL syntax, or with HBase tables. Do not use it for large ETL jobs or benchmark tests for load operations. Do not run scripts with thousands of INSERT ... VALUES statements that insert a single row each time. If you do run INSERT ... VALUES operations to load data into a staging table as one stage in an ETL pipeline, include multiple row values if possible within each VALUES clause, and use a separate database to make cleanup easier if the operation does produce many tiny files.&lt;/P&gt;&lt;P&gt;&lt;A href="http://www.cloudera.com/documentation/enterprise/5-5-x/topics/impala_insert.html"&gt;http://www.cloudera.com/documentation/enterprise/5-5-x/topics/impala_insert.html&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 26 Oct 2016 23:02:19 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Nifi-Hive-are-not-running-map-jobs-in-parallel-How-can-I/m-p/139212#M43949</guid>
      <dc:creator>liranye</dc:creator>
      <dc:date>2016-10-26T23:02:19Z</dc:date>
    </item>
  </channel>
</rss>

