Member since
10-19-2016
3
Posts
2
Kudos Received
0
Solutions
10-26-2016
04:02 PM
1 Kudo
Hi, I found this about insert operation and parallelism: Note: The INSERT ... VALUES technique is not suitable for loading large quantities of data into HDFS-based tables, because the insert operations cannot be parallelized, and each one produces a separate data file. Use it for setting up small dimension tables or tiny amounts of data for experimenting with SQL syntax, or with HBase tables. Do not use it for large ETL jobs or benchmark tests for load operations. Do not run scripts with thousands of INSERT ... VALUES statements that insert a single row each time. If you do run INSERT ... VALUES operations to load data into a staging table as one stage in an ETL pipeline, include multiple row values if possible within each VALUES clause, and use a separate database to make cleanup easier if the operation does produce many tiny files. http://www.cloudera.com/documentation/enterprise/5-5-x/topics/impala_insert.html
... View more