Support Questions

Find answers, ask questions, and share your expertise

Tez Could not get block location when insert

avatar
Explorer

I've been stuck for so long, I don't know what to do. Help❤️

 

I have installed Hive using ambari,When I try to insert a piece of data into the table, I get a Tez error,Here's what I've done:

  • hive services and Tez services are normal.
  • The user name I use is root, and encryption-free login has been configured on all nodes.
  • root has its own /user/root directory and /tmp/hive/root on hdfs
  • root is a hive administrator and has all permissions on related databases and tables
  • I use yarn application-list to verify that no tasks are running
  • I use yarn node-list to confirm that all nodes are RUNNING
  • I use hdfs dfsadmin-report to confirm that all nodes are Normal and in normal state
  • I use hadoop fsck confirmed /warehouse/tablespace/managed/hive/root.db/test is healthy

My question is detailed below:

I have a list like this

0: jdbc:hive2://hdp0:2181,hdp1:2181,hdp2:2181> desc test;
INFO  : Compiling command(queryId=hive_20230614090425_2f598338-0603-4ba5-b4a2-8b77796b6d1b): desc test
INFO  : Semantic Analysis Completed (retrial = false)
INFO  : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:col_name, type:string, comment:from deserializer), FieldSchema(name:data_type, type:string, comment:from deserializer), FieldSchema(name:comment, type:string, comment:from deserializer)], properties:null)
INFO  : Completed compiling command(queryId=hive_20230614090425_2f598338-0603-4ba5-b4a2-8b77796b6d1b); Time taken: 0.103 seconds
INFO  : Executing command(queryId=hive_20230614090425_2f598338-0603-4ba5-b4a2-8b77796b6d1b): desc test
INFO  : Starting task [Stage-0:DDL] in serial mode
INFO  : Completed executing command(queryId=hive_20230614090425_2f598338-0603-4ba5-b4a2-8b77796b6d1b); Time taken: 0.012 seconds
INFO  : OK
+-----------+------------+----------+
| col_name  | data_type  | comment  |
+-----------+------------+----------+
| id        | int        |          |
| score     | int        |          |
+-----------+------------+----------+

 When I try to insert data, I get an error

0: jdbc:hive2://hdp0:2181,hdp1:2181,hdp2:2181> insert into table test values (1,1);
INFO  : Compiling command(queryId=hive_20230614090629_4118dca3-cb4d-4ac4-a5e7-28b932278055): insert into table test values (1,1)
INFO  : Semantic Analysis Completed (retrial = false)
INFO  : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:col1, type:int, comment:null), FieldSchema(name:col2, type:int, comment:null)], properties:null)
INFO  : Completed compiling command(queryId=hive_20230614090629_4118dca3-cb4d-4ac4-a5e7-28b932278055); Time taken: 0.263 seconds
INFO  : Executing command(queryId=hive_20230614090629_4118dca3-cb4d-4ac4-a5e7-28b932278055): insert into table test values (1,1)
INFO  : Query ID = hive_20230614090629_4118dca3-cb4d-4ac4-a5e7-28b932278055
INFO  : Total jobs = 1
INFO  : Launching Job 1 out of 1
INFO  : Starting task [Stage-1:MAPRED] in serial mode
WARN  : The session: sessionId=32ed8d7a-8acb-4e68-a4d3-20210f38c38b, queueName=null, user=root, doAs=true, isOpen=false, isDefault=false has not been opened
INFO  : Subscribed to counters: [] for queryId: hive_20230614090629_4118dca3-cb4d-4ac4-a5e7-28b932278055
INFO  : Tez session hasn't been created yet. Opening session
INFO  : Dag name: insert into table test values (1,1) (Stage-1)
INFO  : Dag submit failed due to java.io.IOException: Could not get block locations. Source file "/tmp/hive/root/_tez_session_dir/3816e222-7805-48e5-a726-9ee2d1d84c5d/.tez/application_1686711037964_0022/recovery/1/summary" - Aborting...block==null
        at org.apache.tez.dag.app.DAGAppMaster.startDAG(DAGAppMaster.java:2591)
        at org.apache.tez.dag.app.DAGAppMaster.submitDAGToAppMaster(DAGAppMaster.java:1407)
        at org.apache.tez.dag.api.client.DAGClientHandler.submitDAG(DAGClientHandler.java:143)
        at org.apache.tez.dag.api.client.rpc.DAGClientAMProtocolBlockingPBServerImpl.submitDAG(DAGClientAMProtocolBlockingPBServerImpl.java:184)
        at org.apache.tez.dag.api.client.rpc.DAGClientAMProtocolRPC$DAGClientAMProtocol$2.callBlockingMethod(DAGClientAMProtocolRPC.java:7636)
        at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025)
        at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:876)
        at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:822)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2682)
Caused by: java.io.IOException: Could not get block locations. Source file "/tmp/hive/root/_tez_session_dir/3816e222-7805-48e5-a726-9ee2d1d84c5d/.tez/application_1686711037964_0022/recovery/1/summary" - Aborting...block==null
        at org.apache.hadoop.hdfs.DataStreamer.setupPipelineForAppendOrRecovery(DataStreamer.java:1477)
        at org.apache.hadoop.hdfs.DataStreamer.processDatanodeOrExternalError(DataStreamer.java:1256)
        at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:667)
 stack trace: [sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method), sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62), sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45), java.lang.reflect.Constructor.newInstance(Constructor.java:423), org.apache.tez.common.RPCUtil.instantiateException(RPCUtil.java:53), org.apache.tez.common.RPCUtil.instantiateRuntimeException(RPCUtil.java:85), org.apache.tez.common.RPCUtil.unwrapAndThrowException(RPCUtil.java:135), org.apache.tez.client.TezClient.submitDAGSession(TezClient.java:705), org.apache.tez.client.TezClient.submitDAG(TezClient.java:588), org.apache.hadoop.hive.ql.exec.tez.TezTask.submit(TezTask.java:543), org.apache.hadoop.hive.ql.exec.tez.TezTask.execute(TezTask.java:221), org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:212), org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:103), org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:2712), org.apache.hadoop.hive.ql.Driver.execute(Driver.java:2383), org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:2055), org.apache.hadoop.hive.ql.Driver.run(Driver.java:1753), org.apache.hadoop.hive.ql.Driver.run(Driver.java:1747), org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:157), org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:226), org.apache.hive.service.cli.operation.SQLOperation.access$700(SQLOperation.java:87), org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:324), java.security.AccessController.doPrivileged(Native Method), javax.security.auth.Subject.doAs(Subject.java:422), org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730), org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:342), java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511), java.util.concurrent.FutureTask.run(FutureTask.java:266), java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511), java.util.concurrent.FutureTask.run(FutureTask.java:266), java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149), java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624), java.lang.Thread.run(Thread.java:750)] retrying...
ERROR : Failed to execute tez graph.
java.io.IOException: Could not get block locations. Source file "/tmp/hive/root/_tez_session_dir/3816e222-7805-48e5-a726-9ee2d1d84c5d/.tez/application_1686711037964_0023/tez-conf.pb" - Aborting...block==null
        at org.apache.hadoop.hdfs.DataStreamer.setupPipelineForAppendOrRecovery(DataStreamer.java:1477) ~[hadoop-hdfs-client-3.1.1.3.1.4.0-315.jar:?]
        at org.apache.hadoop.hdfs.DataStreamer.processDatanodeOrExternalError(DataStreamer.java:1256) ~[hadoop-hdfs-client-3.1.1.3.1.4.0-315.jar:?]
        at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:667) ~[hadoop-hdfs-client-3.1.1.3.1.4.0-315.jar:?]
ERROR : FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.tez.TezTask
INFO  : Completed executing command(queryId=hive_20230614090629_4118dca3-cb4d-4ac4-a5e7-28b932278055); Time taken: 5.046 seconds
Error: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.tez.TezTask (state=08S01,code=1)

 

Here's what I've done since the problem arose:

  • I set the /tmp/root directory to 755 with the chmod -R command, but this did not solve the problem
1 ACCEPTED SOLUTION

avatar
Explorer

every body.

I think this is an issue with Docker containerization. I redeployed it using VMware virtual machine today and there were no issues.

View solution in original post

3 REPLIES 3

avatar
Master Collaborator

@xiamu this error could appear if the data nodes are not healthy. Does the job fail repeatedly, or it succeeds at times? Have you tried running it with a different user? 

This is where it is failing:

 

private void setupPipelineForAppendOrRecovery() throws IOException {
    // Check number of datanodes. Note that if there is no healthy datanode,
    // this must be internal error because we mark external error in striped
    // outputstream only when all the streamers are in the DATA_STREAMING stage
    if (nodes == null || nodes.length == 0) {
      String msg = "Could not get block locations. " + "Source file \""
          + src + "\" - Aborting..." + this;
      LOG.warn(msg);
      lastException.set(new IOException(msg));
      streamerClosed = true;
      return;
    }
    setupPipelineInternal(nodes, storageTypes, storageIDs);
  }

 

avatar
Explorer

My data datanodes are all normal.

 

hdfs dfsadmin -report
Configured Capacity: 707412281856 (658.83 GB)
Present Capacity: 592360489158 (551.68 GB)
DFS Remaining: 585697374208 (545.47 GB)
DFS Used: 6663114950 (6.21 GB)
DFS Used%: 1.12%
Replicated Blocks:
        Under replicated blocks: 5
        Blocks with corrupt replicas: 1
        Missing blocks: 0
        Missing blocks (with replication factor 1): 0
        Low redundancy blocks with highest priority to recover: 0
        Pending deletion blocks: 2
Erasure Coded Block Groups:
        Low redundancy block groups: 0
        Block groups with corrupt internal blocks: 0
        Missing block groups: 0
        Low redundancy blocks with highest priority to recover: 0
        Pending deletion blocks: 0

-------------------------------------------------
Live datanodes (3):

Name: 172.19.0.3:50010 (bgs1)
Hostname: bgs1
Decommission Status : Normal
Configured Capacity: 235804093952 (219.61 GB)
DFS Used: 2220986368 (2.07 GB)
Non DFS Used: 24545226240 (22.86 GB)
DFS Remaining: 195142991872 (181.74 GB)
DFS Used%: 0.94%
DFS Remaining%: 82.76%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 2
Last contact: Sat Jun 17 04:27:19 UTC 2023
Last Block Report: Sat Jun 17 04:17:01 UTC 2023
Num of Blocks: 130


Name: 172.19.0.4:50010 (bgs2)
Hostname: bgs2
Decommission Status : Normal
Configured Capacity: 235804093952 (219.61 GB)
DFS Used: 2220888064 (2.07 GB)
Non DFS Used: 24545361408 (22.86 GB)
DFS Remaining: 195277172736 (181.87 GB)
DFS Used%: 0.94%
DFS Remaining%: 82.81%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 2
Last contact: Sat Jun 17 04:27:20 UTC 2023
Last Block Report: Sat Jun 17 04:16:28 UTC 2023
Num of Blocks: 130


Name: 172.19.0.5:50010 (bgs3)
Hostname: bgs3
Decommission Status : Normal
Configured Capacity: 235804093952 (219.61 GB)
DFS Used: 2221240518 (2.07 GB)
Non DFS Used: 24544972090 (22.86 GB)
DFS Remaining: 195277209600 (181.87 GB)
DFS Used%: 0.94%
DFS Remaining%: 82.81%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 2
Last contact: Sat Jun 17 04:27:19 UTC 2023
Last Block Report: Sat Jun 17 04:17:38 UTC 2023
Num of Blocks: 130

 

 

The same error happens when I use different users.

 

It might be worth mentioning that I used docker containers to simulate the cluster. When I continued to run the test program, I found that the HDFS test program also reported errors, so I thought maybe the error of Tez was due to the error of HDFS.

Traceback (most recent call last):
  File "/var/lib/ambari-agent/cache/stacks/HDP/3.0/services/HDFS/package/scripts/service_check.py", line 167, in <module>
    HdfsServiceCheck().execute()
  File "/usr/lib/ambari-agent/lib/resource_management/libraries/script/script.py", line 352, in execute
    method(env)
  File "/var/lib/ambari-agent/cache/stacks/HDP/3.0/services/HDFS/package/scripts/service_check.py", line 88, in service_check
    action="create_on_execute"
  File "/usr/lib/ambari-agent/lib/resource_management/core/base.py", line 166, in __init__
    self.env.run()
  File "/usr/lib/ambari-agent/lib/resource_management/core/environment.py", line 160, in run
    self.run_action(resource, action)
  File "/usr/lib/ambari-agent/lib/resource_management/core/environment.py", line 124, in run_action
    provider_action()
  File "/usr/lib/ambari-agent/lib/resource_management/libraries/providers/hdfs_resource.py", line 677, in action_create_on_execute
    self.action_delayed("create")
  File "/usr/lib/ambari-agent/lib/resource_management/libraries/providers/hdfs_resource.py", line 674, in action_delayed
    self.get_hdfs_resource_executor().action_delayed(action_name, self)
  File "/usr/lib/ambari-agent/lib/resource_management/libraries/providers/hdfs_resource.py", line 373, in action_delayed
    self.action_delayed_for_nameservice(None, action_name, main_resource)
  File "/usr/lib/ambari-agent/lib/resource_management/libraries/providers/hdfs_resource.py", line 403, in action_delayed_for_nameservice
    self._create_resource()
  File "/usr/lib/ambari-agent/lib/resource_management/libraries/providers/hdfs_resource.py", line 419, in _create_resource
    self._create_file(self.main_resource.resource.target, source=self.main_resource.resource.source, mode=self.mode)
  File "/usr/lib/ambari-agent/lib/resource_management/libraries/providers/hdfs_resource.py", line 534, in _create_file
    self.util.run_command(target, 'CREATE', method='PUT', overwrite=True, assertable_result=False, file_to_put=source, **kwargs)
  File "/usr/lib/ambari-agent/lib/resource_management/libraries/providers/hdfs_resource.py", line 214, in run_command
    return self._run_command(*args, **kwargs)
  File "/usr/lib/ambari-agent/lib/resource_management/libraries/providers/hdfs_resource.py", line 295, in _run_command
    raise WebHDFSCallException(err_msg, result_dict)
resource_management.libraries.providers.hdfs_resource.WebHDFSCallException: Execution of 'curl -sS -L -w '%{http_code}' -X PUT --data-binary @/var/lib/ambari-agent/tmp/hdfs-service-check -H 'Content-Type: application/octet-stream' 'http://bgm:50070/webhdfs/v1/tmp/id13ac0200_date521723?op=CREATE&user.name=hdfs&overwrite=True'' returned status_code=403. 
{
  "RemoteException": {
    "exception": "IOException", 
    "javaClassName": "java.io.IOException", 
    "message": "File /tmp/id13ac0200_date521723 could only be written to 0 of the 1 minReplication nodes. There are 3 datanode(s) running and 3 node(s) are excluded in this operation.\n\tat org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:2121)\n\tat org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.chooseTargetForNewBlock(FSDirWriteFileOp.java:286)\n\tat org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2706)\n\tat org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:875)\n\tat org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:561)\n\tat org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)\n\tat org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524)\n\tat org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025)\n\tat org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:876)\n\tat org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:822)\n\tat java.security.AccessController.doPrivileged(Native Method)\n\tat javax.security.auth.Subject.doAs(Subject.java:422)\n\tat org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)\n\tat org.apache.hadoop.ipc.Server$Handler.run(Server.java:2682)\n"
  }
}

There are 3 datanode(s) running and 3 node(s) are excluded in this operation. Why?

avatar
Explorer

every body.

I think this is an issue with Docker containerization. I redeployed it using VMware virtual machine today and there were no issues.