Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Impala Partition Query running Slow

SOLVED Go to solution

Impala Partition Query running Slow

Explorer

So I was trying to partition my Impala table with the column 'file' which has 1500 distinct records. That means 1500 partitions. I first run a query like this to return the partition queries:

SELECT DISTINCT
  concat('insert into partitioned_table partition (year=',
    cast(year as string),', month=',cast(month as string),
    ') select c1, c2, c3 from raw_data where year=',
    cast(year as string),' and month=',cast(month as string),';') AS command
  FROM raw_data;

Then I got 1500 queries to run.

搜狗截图20170207102602.png 

Now there's one problem: Since each query might take 3 minutes to finish. 1500 queries could take several days. Which is a really long time. To save the time, I have already done some tuning: using COMPUTE STATS to get statics, convert table to Parquet. MY question is, Is there a way that can speed up this process? Like max up the executors just like Hive can do?

 

My Cluster:

3 nodes, each with 6 core cpu, 32GB RAM and 1.8T hard disk

 

Any thoughts would be helpful, Thank you!

1 ACCEPTED SOLUTION

Accepted Solutions
Highlighted

Re: Impala Partition Query running Slow

Explorer

I found out the solution. here is the link

1 REPLY 1
Highlighted

Re: Impala Partition Query running Slow

Explorer

I found out the solution. here is the link