Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

How to increase impala performance?

How to increase impala performance?

New Contributor

I have 86.000.000 rows on hdfs in parquet format. When I run impala with the basic query like "select * from my_table where id = '12345678'; ", it takes around 35 seconds until showing the result. My questions are:

 

1. Is it normal for 86 million rows?

2. Does it help adding impala deamons on other clusters to increase performance?

3. Should I use solr or elastic-search to search row with id instead of impala?

4. What should I do to order rows with id in real time? Any advice for it? Impala order queries is not fast.

 

Note: Impala deamon mem_limit : 4 GB,  max_result_cache_size: 50000

1 REPLY 1
Highlighted

Re: How to increase impala performance?

Expert Contributor

Have you run "compute stats" for the table?

If not, plz do so. It will help.

 

I have done some performance testing between RCFile and  Parquet.

So far, somehow I haven't seen good performance with Parquet. It might be I'm not using Parquet corrently.

You might want to try RCFile too.

 

 

Good luck.

Gatsby

Don't have an account?
Coming from Hortonworks? Activate your account here