Created 03-03-2017 07:30 PM
Hi All,
I am trying to query my hive tables is ORC format from Power BI using Hortonworks Hive ODBC Driver. The time taken to display results for the same query in Power BI is about 10-15 seconds more than the time to display results directly from the Hive shell. I want to understand what is causing this delay and how can I reduce it?
I want to get the results under seconds but that is a different question since Hive shell itself is taking very long to retrieve results. I first want to minimize this time difference between hive shell results and Power BI results.
I tried changing some ODBC driver parameters and TEZ configuration settings but nothing seems to work. Can someone please give me some pointers?
Created 03-07-2017 02:26 AM
The 15 additional seconds are outside of Hive. As such, the changes you make to Tez settings will only impact (positively or negatively) the Hive query, but it will not address your 15 seconds lag outside Hive. You delay is a combination of network latency, ODBC and Power BI rendering it in UI, The focus for tuning should be on ODBC tuning to chunk the data better to increase throughput, address the network latency (if any), caching on Power BI side. There are a few things that you could do on Hive side too, but it is a long shot to guess what you and you did not. For example, use ORC format, use Interactive Query (LLAP), increase the use of caching for map-side joins, improve parallelism by setting a few parameters that will allow a better chunking of the data for increased parallelism, etc.
Created 03-07-2017 02:26 AM
The 15 additional seconds are outside of Hive. As such, the changes you make to Tez settings will only impact (positively or negatively) the Hive query, but it will not address your 15 seconds lag outside Hive. You delay is a combination of network latency, ODBC and Power BI rendering it in UI, The focus for tuning should be on ODBC tuning to chunk the data better to increase throughput, address the network latency (if any), caching on Power BI side. There are a few things that you could do on Hive side too, but it is a long shot to guess what you and you did not. For example, use ORC format, use Interactive Query (LLAP), increase the use of caching for map-side joins, improve parallelism by setting a few parameters that will allow a better chunking of the data for increased parallelism, etc.
Created 03-28-2017 05:14 PM
Thanks @Constantin Stanca. That really helped.
Created 04-02-2018 12:12 PM
your explanation is very helpful. I am also facing a similar problem of slow response in power BI and Microsoft team from azure has recommended us to import hive data to sql and then connect to Power BI for better performance since we are providing a live dashboard to the customers. But it is incurring us an extra cost since we have to pay for SQl transaction too . do you think that is the only option or is there a better way to provide good speed in Power BI dashboard