Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Hive query stalls on hive cli

Solved Go to solution
Highlighted

Hive query stalls on hive cli

Hi, after experiencing very slow performance on various VM's I am now using HDP Sandbox on an Azure A6 configuration (4 cores, 28GB memory). I assume this should be enough for reasonable performance. Yet, I got a Java heap OutOfMemoryError while running simple queries based on the Hello World tutorial (e.g. SELECT max(mpg) FROM truck_mileage). I increased the Tez container size to 2048MB, which solves the OutOfMemoryError, however now the query is stalling. What is going wrong / are there any parameters to be set differently? Thanks in advance.

hive> select max(mpg) from truck_mileage;
Query ID = hive_20160123104344_a796f4ba-2fa0-4272-b4e8-42720fd32417
Total jobs = 1
Launching Job 1 out of 1
Status: Running (Executing on YARN cluster with App id application_1453543525470_0004)
--------------------------------------------------------------------------------
        VERTICES      STATUS  TOTAL  COMPLETED  RUNNING  PENDING  FAILED  KILLED
--------------------------------------------------------------------------------
Map 1                 INITED      1          0        0        1       0       0
Reducer 2             INITED      1          0        0        1       0       0
-------------------------------------------------------------------------------
VERTICES: 00/02  [>>--------------------------] 0%    ELAPSED TIME: 1236.84 s
--------------------------------------------------------------------------------
1 ACCEPTED SOLUTION

Accepted Solutions
Highlighted

Re: Hive query stalls on hive cli

I just solved this Azure Sandbox issue based on a comment by Paul Hargis I found on one of the tutorial pages:

Workaround for Hive queries OutOfMemory errors:

Please note that in some cases (such as when running the Hortonworks Sandbox on Microsoft Azure VM and allocating ‘A4’ VM machine), some of the Hive queries will produce OutOfMemory (Java Heap) errors. As a workaround, you can adjust some Hive-Tez config parameters using Ambari console. Go to the Services–>Hive page, click on ‘Configs’ tab, and make the following changes:

1) Scroll down to Optimization section, change Tez Container Size, increasing from 200 to 512 Param: “hive.tez.container.size” Value: 512

2) Click on “Advanced” tab to show extra settings, scroll down to find parameter “hive.tez.java.opts”, and change Hive-Tez Java Opts by increasing Java Heap Max size from 200MB to 512MB: Param: “hive.tez.java.opts” Value: “-server -Xmx512m -Djava.net.preferIPv4Stack=true”

View solution in original post

18 REPLIES 18
Highlighted

Re: Hive query stalls on hive cli

Super Guru

@Michel Meulpolder

try to run hive shell in debug mode first and see where it is getting stuck

hive --hiveconf hive.root.logger=INFO,console,DEBUG
Highlighted

Re: Hive query stalls on hive cli

Thanks Kuldeep. I tried, but as soon as the progress bar is shown, no additional debug info is produced. All that happens after this point is just the elapsed time increasing.

Highlighted

Re: Hive query stalls on hive cli

@Michel Meulpolder

Check if there is any other job running. Most of the times, default queue is blocked because of previous job is consuming all the resources and your job just sits and wait.

You can use Ambari to check the job listing. Your job id is application_1453543525470_0004

login to the box and run

yarn application -list

yarn application -kill appid

Highlighted

Re: Hive query stalls on hive cli

Highlighted

Re: Hive query stalls on hive cli

It is the only job running...

Thanks for the link to the tuning guide. I was wondering though: could something more basic be wrong (i.e. without needing to do sophisticated tuning)? It is a standard Sandbox on an off-the-shelf Azure VM, and I am just following the simple instructions from the Hello World tutorial. The tutorial doesn't mention any specific changes to the configuration (except from having enough resources, which an A6 should have?). Thanks for your help.

Highlighted

Re: Hive query stalls on hive cli

@Michel Meulpolder Could you check the yarn log through ambari ? What's in the application log for app id 004?

Highlighted

Re: Hive query stalls on hive cli

@Michel Meulpolder Try the same query

set hive.execution.enginer=mr;

select max(mpg)from truck_mileage;

Highlighted

Re: Hive query stalls on hive cli

Strangely, checking the Yarn scheduler via Ambari it turns out that the job (004) finished succesfully after 31 minutes. Seems a very long time though. I notice now in 'cluster metrics' that it says total memory = 2.20GB. Could this be the problem? The VM has 28GB, so something seems wrong there in the memory allocation.

(Also tried to check the application log, but when I click 'logs' it points to an invalid url.)

Highlighted

Re: Hive query stalls on hive cli

@Michel Meulpolder Generally, Ambari does good job in predicting the settings. You can use this too

https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.4/bk_installing_manually_book/content/determi...

Follow the above link and I think you will get enough details about your cluster.

Don't have an account?
Coming from Hortonworks? Activate your account here