<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: Impala for real time data ingest in Archives of Support Questions (Read Only)</title>
    <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Impala-for-real-time-data-ingest/m-p/11634#M1683</link>
    <description>&lt;P&gt;&lt;SPAN&gt;Thanks for the quick reply,&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT color="#000000"&gt;&lt;SPAN&gt;Do you mean 10000 rows per second? What's the size of those rows? i.e. whats the rough MB/second ingest rate?&lt;/SPAN&gt;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="line-height: normal; color: #ff0000;"&gt;Yes about 10,000 to 20,000 records per second, somewhere around 75 MB/sec.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT color="#000000"&gt;&lt;SPAN style="line-height: normal; color: #ff0000;"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT color="#000000"&gt;&lt;SPAN&gt;A good approach is to stage the ingest data into a row based format (e.g. avro). Periodically,&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN&gt;when&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN&gt;enough data has been&lt;/SPAN&gt;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT color="#000000"&gt;&lt;SPAN&gt;accumulated, the data would be transformed into parquet (This could be done via Impala for&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN&gt;example by&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN&gt;doing an "insert into&lt;/SPAN&gt;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT color="#000000"&gt;&lt;SPAN&gt;&amp;lt;parquet_table&amp;gt; select * from staging_table".) Impala can query tables that are mixed format&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN&gt;so the data in the staging format&lt;/SPAN&gt;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT color="#000000"&gt;&lt;SPAN&gt;would still be immediately accessible.&lt;/SPAN&gt;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT color="#FF0000"&gt;&lt;SPAN&gt;This makes sense. &amp;nbsp;I'm assuming I would need to create the process to do the "insert into..." and there is not a built in timer to run this task on an interval or watch the staging table?&lt;/SPAN&gt;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="line-height: normal; color: #000000;"&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT color="#000000"&gt;There is very little distinction between external and managed. The big difference is if it is managed, the files in HDFS are&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT color="#000000"&gt;deleted if the table is dropped. The automatic partition creation is done in the ingest. For example, if you did the ingest via&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT color="#000000"&gt;an insert into a date partitioned table, Impala will automatically create partitions for new date values.&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT color="#FF0000"&gt;&lt;SPAN&gt;If it was external, I would not issue an "insert into..." command since since some other process would be putting the data to HDFS and the table would just be an interface over that data, right?&lt;/SPAN&gt;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT color="#FF0000"&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT color="#FF0000"&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/FONT&gt;&lt;/P&gt;</description>
    <pubDate>Thu, 01 May 2014 20:12:30 GMT</pubDate>
    <dc:creator>MattH</dc:creator>
    <dc:date>2014-05-01T20:12:30Z</dc:date>
    <item>
      <title>Impala for real time data ingest</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Impala-for-real-time-data-ingest/m-p/11612#M1680</link>
      <description>&lt;P&gt;I am looking at using Impala for “real time” reporting via BI tools.&amp;nbsp; It is my understanding that Impala is the tool to use for real time reporting vs. HIVE or custom MapReduce.&amp;nbsp; I also read that using the Parquet format can increase the query speed if only selecting a few columns of data.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;What I would like to do is have Impala managed tables with the Parquet format and using partitions on time, where partitions are at least 1 GB.&amp;nbsp; Another requirement would be that I need to load data into the table in real time at a rate of 10’s of 1,000’s per second.&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;What I am noticing is that the Impala insert times are slow and won’t keep up with this rate.&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Is an Impala managed table not designed for real time ingest?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Is there a better way to load data that can be done in real time?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Should I be utilizing an Impala external table?&amp;nbsp; If so, does the files loaded to HDFS need to be in Parquet format?&amp;nbsp; How do I manage automatic partition creation?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Would performance be better if I were to create a table off data in HBase?&amp;nbsp; If so, does each record need to be structured the same (same fields) for Impala to know how to access it?&lt;/P&gt;</description>
      <pubDate>Fri, 16 Sep 2022 08:58:20 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Impala-for-real-time-data-ingest/m-p/11612#M1680</guid>
      <dc:creator>MattH</dc:creator>
      <dc:date>2022-09-16T08:58:20Z</dc:date>
    </item>
    <item>
      <title>Re: Impala for real time data ingest</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Impala-for-real-time-data-ingest/m-p/11628#M1681</link>
      <description>&lt;P&gt;Hey Matt,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;You may find the &lt;A target="_self" href="https://github.com/kite-sdk/kite-examples/tree/master/dataset-staging"&gt;dataset-staging example from the Kite SDK&lt;/A&gt; to be instructive on how to write messages into HDFS using the row-based Avro format for high write throughput and then periodically rewrite the files using the columnar Parquet format. I haven't tried this example out myself with Impala, so let me know how it works for you if you do make time to try it out!&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Regards,&lt;BR /&gt;Jeff&lt;/P&gt;</description>
      <pubDate>Thu, 01 May 2014 19:45:58 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Impala-for-real-time-data-ingest/m-p/11628#M1681</guid>
      <dc:creator>hammer</dc:creator>
      <dc:date>2014-05-01T19:45:58Z</dc:date>
    </item>
    <item>
      <title>Re: Impala for real time data ingest</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Impala-for-real-time-data-ingest/m-p/11630#M1682</link>
      <description>&lt;BLOCKQUOTE&gt;&lt;FONT color="#FF6600"&gt;Inline.&lt;/FONT&gt;&lt;/BLOCKQUOTE&gt;&lt;BLOCKQUOTE&gt;&lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/4406"&gt;@MattH&lt;/a&gt; wrote:&lt;BR /&gt;&lt;P&gt;I am looking at using Impala for “real time” reporting via BI tools.&amp;nbsp; It is my understanding that Impala is the tool to use for real time reporting vs. HIVE or custom MapReduce.&amp;nbsp; I also read that using the Parquet format can increase the query speed if only selecting a few columns of data.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;What I would like to do is have Impala managed tables with the Parquet format and using partitions on time, where partitions are at least 1 GB.&amp;nbsp; Another requirement would be that I need to load data into the table in real time at a rate of 10’s of 1,000’s per second.&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;FONT color="#FF6600"&gt;Do you mean 10000 rows per second? What's the size of those rows? i.e. whats the rough MB/second ingest rate?&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;What I am noticing is that the Impala insert times are slow and won’t keep up with this rate.&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;FONT color="#FF6600"&gt;A single insert statement will always generate a new file so this is probably not what you want to do.&amp;nbsp;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Is an Impala managed table not designed for real time ingest?&lt;/P&gt;&lt;P&gt;&lt;SPAN style="line-height: 1.2;"&gt;Is there a better way to load data that can be done in real time?&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT color="#FF6600"&gt;&lt;SPAN style="line-height: 1.2;"&gt;A good approach is to stage the ingest data into a row based format (e.g. avro). Periodically,&amp;nbsp;&lt;/SPAN&gt;&lt;/FONT&gt;&lt;SPAN style="color: #ff6600; line-height: 1.2;"&gt;when&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN style="color: #ff6600; line-height: 1.2;"&gt;enough data has been&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="color: #ff6600; line-height: 1.2;"&gt;accumulated, the data would be transformed into parquet (This could be done via Impala for&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN style="color: #ff6600; line-height: 1.2;"&gt;example by&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN style="color: #ff6600; line-height: 1.2;"&gt;doing an "insert into&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="color: #ff6600; line-height: 1.2;"&gt;&amp;lt;parquet_table&amp;gt; select * from staging_table".) Impala can query tables that are mixed format&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN style="line-height: 1.2; color: #ff6600;"&gt;so the data in the staging format &lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="line-height: 1.2; color: #ff6600;"&gt;would still be immediately accessible.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT color="#FF6600"&gt;&lt;SPAN style="line-height: 1.2;"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT color="#FF6600"&gt;&lt;SPAN style="line-height: 1.2;"&gt;Take a look at the flume project which will help with this.&amp;nbsp;&lt;A target="_blank" href="http://flume.apache.org/"&gt;http://flume.apache.org/&lt;/A&gt;&lt;/SPAN&gt;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Should I be utilizing an Impala external table?&amp;nbsp; If so, does the files loaded to HDFS need to be in Parquet format?&amp;nbsp; How do I manage automatic partition creation?&lt;/P&gt;&lt;P&gt;&lt;FONT color="#FF6600"&gt;There is very little distinction between external and managed. The big difference is if it is managed, the files in HDFS are &lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT color="#FF6600"&gt;deleted if the table is dropped. The automatic partition creation is done in the ingest. For example, if you did the ingest via &lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT color="#FF6600"&gt;an insert into a date partitioned table, Impala will automatically create partitions for new date values.&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Would performance be better if I were to create a table off data in HBase?&amp;nbsp; If so, does each record need to be structured the same (same fields) for Impala to know how to access it?&lt;/P&gt;&lt;P&gt;&lt;FONT color="#FF6600"&gt;Scan performance ontop of HBase is much worse currently than ontop of HDFS. You can take a look at this doc on how to match the schemas.&amp;nbsp;&lt;A target="_blank" href="http://www.cloudera.com/content/cloudera-content/cloudera-docs/Impala/latest/Installing-and-Using-Impala/ciiu_impala_hbase.html"&gt;&lt;FONT color="#FF6600"&gt;http://www.cloudera.com/content/cloudera-content/cloudera-docs/Impala/latest/Installing-and-Using-Impala/ciiu_impala_hbase.html&lt;/FONT&gt;&lt;/A&gt;&lt;/FONT&gt;&lt;/P&gt;&lt;HR /&gt;&lt;/BLOCKQUOTE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 01 May 2014 19:52:09 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Impala-for-real-time-data-ingest/m-p/11630#M1682</guid>
      <dc:creator>nong</dc:creator>
      <dc:date>2014-05-01T19:52:09Z</dc:date>
    </item>
    <item>
      <title>Re: Impala for real time data ingest</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Impala-for-real-time-data-ingest/m-p/11634#M1683</link>
      <description>&lt;P&gt;&lt;SPAN&gt;Thanks for the quick reply,&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT color="#000000"&gt;&lt;SPAN&gt;Do you mean 10000 rows per second? What's the size of those rows? i.e. whats the rough MB/second ingest rate?&lt;/SPAN&gt;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="line-height: normal; color: #ff0000;"&gt;Yes about 10,000 to 20,000 records per second, somewhere around 75 MB/sec.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT color="#000000"&gt;&lt;SPAN style="line-height: normal; color: #ff0000;"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT color="#000000"&gt;&lt;SPAN&gt;A good approach is to stage the ingest data into a row based format (e.g. avro). Periodically,&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN&gt;when&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN&gt;enough data has been&lt;/SPAN&gt;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT color="#000000"&gt;&lt;SPAN&gt;accumulated, the data would be transformed into parquet (This could be done via Impala for&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN&gt;example by&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN&gt;doing an "insert into&lt;/SPAN&gt;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT color="#000000"&gt;&lt;SPAN&gt;&amp;lt;parquet_table&amp;gt; select * from staging_table".) Impala can query tables that are mixed format&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN&gt;so the data in the staging format&lt;/SPAN&gt;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT color="#000000"&gt;&lt;SPAN&gt;would still be immediately accessible.&lt;/SPAN&gt;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT color="#FF0000"&gt;&lt;SPAN&gt;This makes sense. &amp;nbsp;I'm assuming I would need to create the process to do the "insert into..." and there is not a built in timer to run this task on an interval or watch the staging table?&lt;/SPAN&gt;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="line-height: normal; color: #000000;"&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT color="#000000"&gt;There is very little distinction between external and managed. The big difference is if it is managed, the files in HDFS are&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT color="#000000"&gt;deleted if the table is dropped. The automatic partition creation is done in the ingest. For example, if you did the ingest via&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT color="#000000"&gt;an insert into a date partitioned table, Impala will automatically create partitions for new date values.&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT color="#FF0000"&gt;&lt;SPAN&gt;If it was external, I would not issue an "insert into..." command since since some other process would be putting the data to HDFS and the table would just be an interface over that data, right?&lt;/SPAN&gt;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT color="#FF0000"&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT color="#FF0000"&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/FONT&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 01 May 2014 20:12:30 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Impala-for-real-time-data-ingest/m-p/11634#M1683</guid>
      <dc:creator>MattH</dc:creator>
      <dc:date>2014-05-01T20:12:30Z</dc:date>
    </item>
    <item>
      <title>Re: Impala for real time data ingest</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Impala-for-real-time-data-ingest/m-p/12018#M1684</link>
      <description>&lt;BLOCKQUOTE&gt;&lt;HR /&gt;&lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/4406"&gt;@MattH&lt;/a&gt; wrote:&lt;BR /&gt;&lt;P&gt;&lt;SPAN&gt;Thanks for the quick reply,&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT color="#000000"&gt;&lt;SPAN&gt;Do you mean 10000 rows per second? What's the size of those rows? i.e. whats the rough MB/second ingest rate?&lt;/SPAN&gt;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="line-height: normal; color: #ff0000;"&gt;Yes about 10,000 to 20,000 records per second, somewhere around 75 MB/sec.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="line-height: normal; color: #ff0000;"&gt;&lt;FONT color="#008000"&gt;That's a pretty good ingest rate. I'd look at Kite and Flume to see if that meets your needs.&lt;/FONT&gt;&lt;BR /&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT color="#000000"&gt;&lt;SPAN style="line-height: normal; color: #ff0000;"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT color="#000000"&gt;&lt;SPAN&gt;A good approach is to stage the ingest data into a row based format (e.g. avro). Periodically,&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN&gt;when&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN&gt;enough data has been&lt;/SPAN&gt;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT color="#000000"&gt;&lt;SPAN&gt;accumulated, the data would be transformed into parquet (This could be done via Impala for&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN&gt;example by&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN&gt;doing an "insert into&lt;/SPAN&gt;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT color="#000000"&gt;&lt;SPAN&gt;&amp;lt;parquet_table&amp;gt; select * from staging_table".) Impala can query tables that are mixed format&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN&gt;so the data in the staging format&lt;/SPAN&gt;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT color="#000000"&gt;&lt;SPAN&gt;would still be immediately accessible.&lt;/SPAN&gt;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT color="#FF0000"&gt;&lt;SPAN&gt;This makes sense. &amp;nbsp;I'm assuming I would need to create the process to do the "insert into..." and there is not a built in timer to run this task on an interval or watch the staging table?&lt;/SPAN&gt;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT color="#FF0000"&gt;&lt;SPAN&gt;&lt;FONT color="#008000"&gt;No, there is no builtin way to scheduling queries periodically. Flume has mechanisms to do this either based on time or data volume.&lt;/FONT&gt;&lt;BR /&gt;&lt;/SPAN&gt;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;FONT color="#000000"&gt;There is very little distinction between external and managed. The big difference is if it is managed, the files in HDFS are&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT color="#000000"&gt;deleted if the table is dropped. The automatic partition creation is done in the ingest. For example, if you did the ingest via&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT color="#000000"&gt;an insert into a date partitioned table, Impala will automatically create partitions for new date values.&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT color="#FF0000"&gt;&lt;SPAN&gt;If it was external, I would not issue an "insert into..." command since since some other process would be putting the data to HDFS and the table would just be an interface over that data, right?&lt;/SPAN&gt;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT color="#FF0000"&gt;&lt;SPAN&gt;&lt;FONT color="#008000"&gt;It doesn't matter if the table is external or managed, you can still drop files into the path in HDFS and have it picked up. The&lt;/FONT&gt;&lt;BR /&gt;&lt;/SPAN&gt;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT color="#FF0000"&gt;&lt;SPAN&gt;&lt;FONT color="#008000"&gt;distinction it what happens on the drop table path.&lt;/FONT&gt;&lt;/SPAN&gt;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT color="#FF0000"&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT color="#FF0000"&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/FONT&gt;&lt;/P&gt;&lt;HR /&gt;&lt;/BLOCKQUOTE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 05 May 2014 18:32:12 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Impala-for-real-time-data-ingest/m-p/12018#M1684</guid>
      <dc:creator>nong</dc:creator>
      <dc:date>2014-05-05T18:32:12Z</dc:date>
    </item>
  </channel>
</rss>

