We are using Cloudera ODBC Driver for Impala to insert data to Hive in a C# application. A simple insert statement with 3 columns takes 10 seconds to complete. We have 1M records to insert every day and the performance is not acceptable. Please suggest, how can we improve the performance.
Ingesting via insert statements is not the preferred method for bulk ingestion - most users will ingest other ways - e.g. streaming ingest by kafka, nifi, spark streaming, etc, or bulk ingest by copying data files into the cluster then ETL in SQL.
If you want help debugging the performance of a query, you're most likely to get actionable advice if you attach a query profile.