Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Hive query failed due to ApplicationMaster for attempt timed out

avatar
Explorer

I have a Hive Table in ORC format, which I tried to query via Beeline:

 

SELECT `timestamp`, url FROM events WHERE id='0ef3c9ba6cb5' ORDER BY `timestamp` DESC;

 


However, this simple query failed with:

 

INFO : Compiling command(queryId=hive_20200605073915_22eb45aa-25f6-419a-9b55-57a0d98e3dac): select `timestamp`, url from events where partyid='0:3pu60uagp0:db698229-272e-4a1c-a18a-0ef3c9ba6cb5' order by `timestamp` desc
INFO : Warning: Map Join MAPJOIN[16][bigTable=?] in task 'Map 1' is a cross product
INFO : Semantic Analysis Completed (retrial = false)
INFO : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:timestamp, type:bigint, comment:null), FieldSchema(name:url, type:string, comment:null)], properties:null)
INFO : Completed compiling command(queryId=hive_20200605073915_22eb45aa-25f6-419a-9b55-57a0d98e3dac); Time taken: 0.724 seconds
INFO : Executing command(queryId=hive_20200605073915_22eb45aa-25f6-419a-9b55-57a0d98e3dac): select `timestamp`, url from events where partyid='0:3pu60uagp0:db698229-272e-4a1c-a18a-0ef3c9ba6cb5' order by `timestamp` desc
INFO : Query ID = hive_20200605073915_22eb45aa-25f6-419a-9b55-57a0d98e3dac
INFO : Total jobs = 1
INFO : Launching Job 1 out of 1
INFO : Starting task [Stage-1:MAPRED] in serial mode
INFO : Subscribed to counters: [] for queryId: hive_20200605073915_22eb45aa-25f6-419a-9b55-57a0d98e3dac
INFO : Tez session hasn't been created yet. Opening session
INFO : Dag name: select `timestamp`, url fr...desc (Stage-1)
INFO : Setting tez.task.scale.memory.reserve-fraction to 0.30000001192092896
INFO : Status: Running (Executing on YARN cluster with App id application_1586459578755_0105)

----------------------------------------------------------------------------------------------
VERTICES MODE STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED
----------------------------------------------------------------------------------------------

----------------------------------------------------------------------------------------------
VERTICES: 00/00 [>>--------------------------] 0% ELAPSED TIME: 9428.98 s
----------------------------------------------------------------------------------------------
ERROR : Status: Failed------------------------] 0% ELAPSED TIME: 9426.66 s
ERROR : Application application_1586459578755_0105 failed 2 times due to ApplicationMaster for attempt appattempt_1586459578755_0105_000002 timed out. Failing the application.
ERROR : FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.tez.TezTask. Application application_1586459578755_0105 failed 2 times due to ApplicationMaster for attempt appattempt_1586459578755_0105_000002 timed out. Failing the application.
INFO : Completed executing command(queryId=hive_20200605073915_22eb45aa-25f6-419a-9b55-57a0d98e3dac); Time taken: 9433.73 seconds
Error: Error while processing statement: FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.tez.TezTask. Application application_1586459578755_0105 failed 2 times due to ApplicationMaster for attempt appattempt_1586459578755_0105_000002 timed out. Failing the application. (state=08S01,code=2)

 


Any clue what is this error about? Other queries take long too if they ever succeed. Any resource configuration I should look into?

1 REPLY 1

avatar
Explorer

FYI my table is partition by Year + Month + Day.

Total file size in HDFS  is 10TB.

Total records is 21 Billion records.

We have 8 data nodes in HDP.

 

I'm using HDP-3.0.1.0 with Ambari. Here's my current Hive config:

 

Tez Container Size: 3072 MB
HiveServer2 Heap Size: 4096 MB
Memory: 819.2 MB
Data per Reducer: 2042.9 MB

 

They are mostly the default values.

Do they make sense?

Any suggestion on which to increase / decrease for optimum performance?