Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Time difference between Query results from Hive and Power BI

SOLVED Go to solution
Highlighted

Time difference between Query results from Hive and Power BI

New Contributor

Hi All,

I am trying to query my hive tables is ORC format from Power BI using Hortonworks Hive ODBC Driver. The time taken to display results for the same query in Power BI is about 10-15 seconds more than the time to display results directly from the Hive shell. I want to understand what is causing this delay and how can I reduce it?

I want to get the results under seconds but that is a different question since Hive shell itself is taking very long to retrieve results. I first want to minimize this time difference between hive shell results and Power BI results.

I tried changing some ODBC driver parameters and TEZ configuration settings but nothing seems to work. Can someone please give me some pointers?

1 ACCEPTED SOLUTION

Accepted Solutions

Re: Time difference between Query results from Hive and Power BI

@Sree Kupp

The 15 additional seconds are outside of Hive. As such, the changes you make to Tez settings will only impact (positively or negatively) the Hive query, but it will not address your 15 seconds lag outside Hive. You delay is a combination of network latency, ODBC and Power BI rendering it in UI, The focus for tuning should be on ODBC tuning to chunk the data better to increase throughput, address the network latency (if any), caching on Power BI side. There are a few things that you could do on Hive side too, but it is a long shot to guess what you and you did not. For example, use ORC format, use Interactive Query (LLAP), increase the use of caching for map-side joins, improve parallelism by setting a few parameters that will allow a better chunking of the data for increased parallelism, etc.

3 REPLIES 3

Re: Time difference between Query results from Hive and Power BI

@Sree Kupp

The 15 additional seconds are outside of Hive. As such, the changes you make to Tez settings will only impact (positively or negatively) the Hive query, but it will not address your 15 seconds lag outside Hive. You delay is a combination of network latency, ODBC and Power BI rendering it in UI, The focus for tuning should be on ODBC tuning to chunk the data better to increase throughput, address the network latency (if any), caching on Power BI side. There are a few things that you could do on Hive side too, but it is a long shot to guess what you and you did not. For example, use ORC format, use Interactive Query (LLAP), increase the use of caching for map-side joins, improve parallelism by setting a few parameters that will allow a better chunking of the data for increased parallelism, etc.

Re: Time difference between Query results from Hive and Power BI

New Contributor

Thanks @Constantin Stanca. That really helped.

Re: Time difference between Query results from Hive and Power BI

New Contributor

Hi @Constantin Stanca

your explanation is very helpful. I am also facing a similar problem of slow response in power BI and Microsoft team from azure has recommended us to import hive data to sql and then connect to Power BI for better performance since we are providing a live dashboard to the customers. But it is incurring us an extra cost since we have to pay for SQl transaction too . do you think that is the only option or is there a better way to provide good speed in Power BI dashboard