Support Questions
Find answers, ask questions, and share your expertise

How to increase impala performance?

Highlighted

How to increase impala performance?

New Contributor

I have 86.000.000 rows on hdfs in parquet format. When I run impala with the basic query like "select * from my_table where id = '12345678'; ", it takes around 35 seconds until showing the result. My questions are:

 

1. Is it normal for 86 million rows?

2. Does it help adding impala deamons on other clusters to increase performance?

3. Should I use solr or elastic-search to search row with id instead of impala?

4. What should I do to order rows with id in real time? Any advice for it? Impala order queries is not fast.

 

Note: Impala deamon mem_limit : 4 GB,  max_result_cache_size: 50000

1 REPLY 1

Re: How to increase impala performance?

Expert Contributor

Have you run "compute stats" for the table?

If not, plz do so. It will help.

 

I have done some performance testing between RCFile and  Parquet.

So far, somehow I haven't seen good performance with Parquet. It might be I'm not using Parquet corrently.

You might want to try RCFile too.

 

 

Good luck.

Gatsby

Don't have an account?