I have 86.000.000 rows on hdfs in parquet format. When I run impala with the basic query like "select * from my_table where id = '12345678'; ", it takes around 35 seconds until showing the result. My questions are:
1. Is it normal for 86 million rows?
2. Does it help adding impala deamons on other clusters to increase performance?
3. Should I use solr or elastic-search to search row with id instead of impala?
4. What should I do to order rows with id in real time? Any advice for it? Impala order queries is not fast.
Note: Impala deamon mem_limit : 4 GB, max_result_cache_size: 50000
Have you run "compute stats" for the table?
If not, plz do so. It will help.
I have done some performance testing between RCFile and Parquet.
So far, somehow I haven't seen good performance with Parquet. It might be I'm not using Parquet corrently.
You might want to try RCFile too.