Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Time difference between Query results from Hive and Power BI

avatar
Contributor

Hi All,

I am trying to query my hive tables is ORC format from Power BI using Hortonworks Hive ODBC Driver. The time taken to display results for the same query in Power BI is about 10-15 seconds more than the time to display results directly from the Hive shell. I want to understand what is causing this delay and how can I reduce it?

I want to get the results under seconds but that is a different question since Hive shell itself is taking very long to retrieve results. I first want to minimize this time difference between hive shell results and Power BI results.

I tried changing some ODBC driver parameters and TEZ configuration settings but nothing seems to work. Can someone please give me some pointers?

1 ACCEPTED SOLUTION

avatar
Super Guru

@Sree Kupp

The 15 additional seconds are outside of Hive. As such, the changes you make to Tez settings will only impact (positively or negatively) the Hive query, but it will not address your 15 seconds lag outside Hive. You delay is a combination of network latency, ODBC and Power BI rendering it in UI, The focus for tuning should be on ODBC tuning to chunk the data better to increase throughput, address the network latency (if any), caching on Power BI side. There are a few things that you could do on Hive side too, but it is a long shot to guess what you and you did not. For example, use ORC format, use Interactive Query (LLAP), increase the use of caching for map-side joins, improve parallelism by setting a few parameters that will allow a better chunking of the data for increased parallelism, etc.

View solution in original post

3 REPLIES 3

avatar
Super Guru

@Sree Kupp

The 15 additional seconds are outside of Hive. As such, the changes you make to Tez settings will only impact (positively or negatively) the Hive query, but it will not address your 15 seconds lag outside Hive. You delay is a combination of network latency, ODBC and Power BI rendering it in UI, The focus for tuning should be on ODBC tuning to chunk the data better to increase throughput, address the network latency (if any), caching on Power BI side. There are a few things that you could do on Hive side too, but it is a long shot to guess what you and you did not. For example, use ORC format, use Interactive Query (LLAP), increase the use of caching for map-side joins, improve parallelism by setting a few parameters that will allow a better chunking of the data for increased parallelism, etc.

avatar
Contributor

Thanks @Constantin Stanca. That really helped.

avatar
New Contributor

Hi @Constantin Stanca

your explanation is very helpful. I am also facing a similar problem of slow response in power BI and Microsoft team from azure has recommended us to import hive data to sql and then connect to Power BI for better performance since we are providing a live dashboard to the customers. But it is incurring us an extra cost since we have to pay for SQl transaction too . do you think that is the only option or is there a better way to provide good speed in Power BI dashboard