I've moved a web application from using MySQL to Impala. To avoid synchronization issues, I want to use Impala exclusively, even for very simple lookup queries. For the most part this has been working out great, as a 20 minute query in MySQL is now running in 2 seconds on Impala.
However, each page that loads can fire around 10-15 queries. This adds up as simple Impala lookups take just under a second, and other simple queries can take between 1-2 seconds. Adding that all up each page takes around 20-30 seconds to load.
I'm running 4 nodes with 32gb of memory each. Memory utilization stays under 100mb during a page load. Is there anyway to speed up Impala to get these queries even faster? Is using Impala to run an entire website not a realistic application for it?
What are the SLA's you need? Can you include the runtime profile for the query you're running to see if
there some easy things you can try?
Please can you share the mechanism of how you have run Impala queries from a website i.e. have you used the Impala ODBC connector in PHP or JQuery?
Thanks in advance!
I used this PHP library: https://github.com/rmcfrazier/php_impala_phar
We ended up moving to a different technology than Impala, as we could not figure out a way to get the queries down from 1-2 seconds, even on simple queries.
Thanks for your reply. We are following the same approach.
Out of interest, which other technology stack are you evaluating please? Based on reports from the Hadoop Word conference, it seemed that Impala was the fastest as compared to any similar Query tool on Hadoop so I was wondering which other toolset could give better results.