Support Questions
Find answers, ask questions, and share your expertise

Create Hive external table from Kafka and got exception when execute query

Create Hive external table from Kafka and got exception when execute query

New Contributor

I have following the sample to create hive external table from kafka and i got the following error when i query data from hive table.


https://docs.hortonworks.com/HDPDocuments/HDF3/HDF-3.4.0/kafka-hive-integration/content/hive_create_...


I am using HDP 3.0.1 sandbox. Thanks.


0: jdbc:hive2://sandbox-hdp.hortonworks.com:2> SELECT MAX(`timestamp`) FROM kafka_table;

INFO : Compiling command(queryId=hive_20190627102832_f7e72f81-14d4-4c47-a19e-129644a8797a): SELECT MAX(`timestamp`) FROM kafka_table

INFO : Semantic Analysis Completed (retrial = false)

INFO : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:_c0, type:timestamp, comment:null)], properties:null)

INFO : Completed compiling command(queryId=hive_20190627102832_f7e72f81-14d4-4c47-a19e-129644a8797a); Time taken: 0.424 seconds

INFO : Executing command(queryId=hive_20190627102832_f7e72f81-14d4-4c47-a19e-129644a8797a): SELECT MAX(`timestamp`) FROM kafka_table

INFO : Query ID = hive_20190627102832_f7e72f81-14d4-4c47-a19e-129644a8797a

INFO : Total jobs = 1

INFO : Launching Job 1 out of 1

INFO : Starting task [Stage-1:MAPRED] in serial mode

INFO : Subscribed to counters: [] for queryId: hive_20190627102832_f7e72f81-14d4-4c47-a19e-129644a8797a

INFO : Session is already open

INFO : Dag name: SELECT MAX(`timestamp`) FROM kafka_table (Stage-1)

INFO : Status: Running (Executing on YARN cluster with App id application_1561628401449_0003)


----------------------------------------------------------------------------------------------

VERTICES MODE STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED

----------------------------------------------------------------------------------------------

Map 1 container INITIALIZING -1 0 0 -1 0 0

Reducer 2 container INITED 1 0 0 1 0 0

----------------------------------------------------------------------------------------------

VERTICES: 00/02 [>>--------------------------] 0% ELAPSED TIME: 30.03 s

----------------------------------------------------------------------------------------------

ERROR : Status: Failed

ERROR : Vertex failed, vertexName=Map 1, vertexId=vertex_1561628401449_0003_2_00, diagnostics=[Vertex vertex_1561628401449_0003_2_00 [Map 1] killed/failed due to:ROOT_INPUT_INIT_FAILURE, Vertex Input: kafka_table initializer failed, vertex=vertex_1561628401449_0003_2_00 [Map 1], java.io.IOException: java.util.concurrent.TimeoutException

at org.apache.hadoop.hive.kafka.KafkaInputFormat.computeSplits(KafkaInputFormat.java:143)

at org.apache.hadoop.hive.kafka.KafkaInputFormat.getSplits(KafkaInputFormat.java:69)

at org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:522)

at org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:777)

at org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.initialize(HiveSplitGenerator.java:243)

at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:278)

at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:269)

at java.security.AccessController.doPrivileged(Native Method)

at javax.security.auth.Subject.doAs(Subject.java:422)

at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)

at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:269)

at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:253)

at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:108)

at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:41)

at com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:77)

at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)

at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)

at java.lang.Thread.run(Thread.java:748)

Caused by: java.util.concurrent.TimeoutException

at java.util.concurrent.FutureTask.get(FutureTask.java:205)

at org.apache.hadoop.hive.kafka.KafkaInputFormat.computeSplits(KafkaInputFormat.java:138)

... 17 more

]

ERROR : Vertex killed, vertexName=Reducer 2, vertexId=vertex_1561628401449_0003_2_01, diagnostics=[Vertex received Kill in INITED state., Vertex vertex_1561628401449_0003_2_01 [Reducer 2] killed/failed due to:OTHER_VERTEX_FAILURE]

ERROR : DAG did not succeed due to VERTEX_FAILURE. failedVertices:1 killedVertices:1

INFO : org.apache.tez.common.counters.DAGCounter:

INFO : AM_CPU_MILLISECONDS: 1360

INFO : AM_GC_TIME_MILLIS: 0

ERROR : FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed, vertexName=Map 1, vertexId=vertex_1561628401449_0003_2_00, diagnostics=[Vertex vertex_1561628401449_0003_2_00 [Map 1] killed/failed due to:ROOT_INPUT_INIT_FAILURE, Vertex Input: kafka_table initializer failed, vertex=vertex_1561628401449_0003_2_00 [Map 1], java.io.IOException: java.util.concurrent.TimeoutException

at org.apache.hadoop.hive.kafka.KafkaInputFormat.computeSplits(KafkaInputFormat.java:143)

at org.apache.hadoop.hive.kafka.KafkaInputFormat.getSplits(KafkaInputFormat.java:69)

at org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:522)

at org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:777)

at org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.initialize(HiveSplitGenerator.java:243)

at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:278)

at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:269)

at java.security.AccessController.doPrivileged(Native Method)

at javax.security.auth.Subject.doAs(Subject.java:422)

at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)

at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:269)

at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:253)

at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:108)

at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:41)

at com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:77)

at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)

at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)

at java.lang.Thread.run(Thread.java:748)

Caused by: java.util.concurrent.TimeoutException

at java.util.concurrent.FutureTask.get(FutureTask.java:205)

at org.apache.hadoop.hive.kafka.KafkaInputFormat.computeSplits(KafkaInputFormat.java:138)

... 17 more

]Vertex killed, vertexName=Reducer 2, vertexId=vertex_1561628401449_0003_2_01, diagnostics=[Vertex received Kill in INITED state., Vertex vertex_1561628401449_0003_2_01 [Reducer 2] killed/failed due to:OTHER_VERTEX_FAILURE]DAG did not succeed due to VERTEX_FAILURE. failedVertices:1 killedVertices:1

INFO : Completed executing command(queryId=hive_20190627102832_f7e72f81-14d4-4c47-a19e-129644a8797a); Time taken: 30.732 seconds

Error: Error while processing statement: FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed, vertexName=Map 1, vertexId=vertex_1561628401449_0003_2_00, diagnostics=[Vertex vertex_1561628401449_0003_2_00 [Map 1] killed/failed due to:ROOT_INPUT_INIT_FAILURE, Vertex Input: kafka_table initializer failed, vertex=vertex_1561628401449_0003_2_00 [Map 1], java.io.IOException: java.util.concurrent.TimeoutException

at org.apache.hadoop.hive.kafka.KafkaInputFormat.computeSplits(KafkaInputFormat.java:143)

at org.apache.hadoop.hive.kafka.KafkaInputFormat.getSplits(KafkaInputFormat.java:69)

at org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:522)

at org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:777)

at org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.initialize(HiveSplitGenerator.java:243)

at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:278)

at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:269)

at java.security.AccessController.doPrivileged(Native Method)

at javax.security.auth.Subject.doAs(Subject.java:422)

at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)

at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:269)

at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:253)

at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:108)

at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:41)

at com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:77)

at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)

at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)

at java.lang.Thread.run(Thread.java:748)

Caused by: java.util.concurrent.TimeoutException

at java.util.concurrent.FutureTask.get(FutureTask.java:205)

at org.apache.hadoop.hive.kafka.KafkaInputFormat.computeSplits(KafkaInputFormat.java:138)

... 17 more

]Vertex killed, vertexName=Reducer 2, vertexId=vertex_1561628401449_0003_2_01, diagnostics=[Vertex received Kill in INITED state., Vertex vertex_1561628401449_0003_2_01 [Reducer 2] killed/failed due to:OTHER_VERTEX_FAILURE]DAG did not succeed due to VERTEX_FAILURE. failedVertices:1 killedVertices:1 (state=08S01,code=2)





4 REPLIES 4

Re: Create Hive external table from Kafka and got exception when execute query

New Contributor

Hi, I encountered exactly the same problem. Were you able to solve it?

Re: Create Hive external table from Kafka and got exception when execute query

Rising Star

I can see that none of the containers are initialised yet:

 

Map 1 container INITIALIZING -1 0 0 -1 0 0

Reducer 2 container INITED 1 0 0 1 0 0

 

and eventually fails with timeout exception:

 

 java.io.IOException: java.util.concurrent.TimeoutException

 

Action Plan:

 

 1. The issue seems to be with yarn contention issue. Set a queue where it is unutilised.

  set tez.queue.name=<under utilised queue>

 

2. Increase the task container timeout

        set tez.task.timeout-ms=6000

 

Set the above values in beeline session (not in AMbari cluster) and re-run the query.

 

Attach below if query fails:

 

 1. set -v

  2. APplication logs

 3. Console output

Re: Create Hive external table from Kafka and got exception when execute query

New Contributor

@asish wrote:

I can see that none of the containers are initialised yet:

 

Map 1 container INITIALIZING -1 0 0 -1 0 0

Reducer 2 container INITED 1 0 0 1 0 0

 

and eventually fails with timeout exception:

 

 java.io.IOException: java.util.concurrent.TimeoutException

 

Action Plan:

 

 1. The issue seems to be with yarn contention issue. Set a queue where it is unutilised.

  set tez.queue.name=<under utilised queue>

 

2. Increase the task container timeout

        set tez.task.timeout-ms=6000

 

Set the above values in beeline session (not in AMbari cluster) and re-run the query.

 

Attach below if query fails:

 

 1. set -v

  2. APplication logs

 3. Console output


Hi, Problem still remains.

1. The issue seems to be with yarn contention issue. Set a queue where it is unutilised.

  set tez.queue.name=<under utilised queue>

Test has been run on unutilized cluster, default queue with 100% capacity allocated.

2. Increase the task container timeout

        set tez.task.timeout-ms=6000

Ambari/Hive/Config: tez.task.timeout-ms 90000

 

Problem occurs only when trying to query one particular topic. Console consumer works perfectly so no connection issues there.

 

Re: Create Hive external table from Kafka and got exception when execute query

Rising Star

The task is not getting launched

 

----------------------------------------------------------------------------------------------
 VERTICES MODE STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED
----------------------------------------------------------------------------------------------
Map 1 container INITIALIZING -1 0 0 -1 0 0
Reducer 2 container INITED 1 0 0 1 0 0
----------------------------------------------------------------------------------------------
VERTICES: 00/02 [>>--------------------------] 0% ELAPSED TIME: 29.84 s
----------------------------------------------------------------------------------------------
ERROR : Status: Failed

 

Can you try with the below settings:

 

Navigato Ambari => Tez ==> tez.am.resource.memory.mb =5120 and restart hive and tez.

 

Set the below in beeline or at the client session and try:

+++++

set hive.tez.container.size=5120;

set hive.tez.java.opts=-Xmx4096m;

set tez.runtime.io.sort.mb=2048;

set tez.task.resource.memory.mb=4096;

+++++++++

If issue persist try with below:

set hive.tez.container.size=10240;

set hive.tez.java.opts=-Xmx8192m;

set tez.runtime.io.sort.mb=4096;

set tez.task.resource.memory.mb=7680;

 

If it fails then issue might be with Kafka or  the yarn