Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Hive Query slowness

avatar
Rising Star

The hive query which is used by my batch is taking too much time to run. Earlier when i fire the same query it took around 5 minutes and now it is taking around 22 minutes. I cant change the query. Please suggest the correct way to investigate this issue or kindly suggest any resolution.

2 ACCEPTED SOLUTIONS

avatar
Rising Star

Couple of things you can check

  1. Check if the dataset has changed from previous run to current run.
  2. Not sure how you are running your query. For e.g, if you are using hive cli, you can use "hive --hiveconf hive.tez.exec.print.summary=true". This should print the pre-execution (compilation, job submission), DAG execution times after the job is complete. That can give hints on where the time is spent
  3. If you have tez-ui, that is the best place to start checking the details on where the time is spent.
  4. It would be good to share the query and "explain <sql>" output with "--hiveconf hive.explain.user=false". If possible, share "explain formatted <sql>" output which dumps the plan information in JSON format.
  5. Check if vertices are running slow due to resource constraints (i.e, some tasks would have started, but others are in waiting mode as resources are not available in queue or in cluster).

View solution in original post

avatar

@Yukti Agrawal

There is a chance that your job might be waiting for resources to be released by other jobs running in the cluster. Its worth checking in RM UI once you execute the query until the state changes to "RUNNING" - where most of the time is being spent.

View solution in original post

11 REPLIES 11

avatar

@Yukti Agrawal

There is a chance that your job might be waiting for resources to be released by other jobs running in the cluster. Its worth checking in RM UI once you execute the query until the state changes to "RUNNING" - where most of the time is being spent.

avatar
New Contributor

Hi,

when I run the hive query it showing the below error


Error while processing statement: FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask

 

But this error is not showing all the time it got succeed with some of the users some times it got failed. Could you please suggest the reason and how to overcome this.

 

need urgent. could you please help us.