Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Tez Could not get block location when insert

avatar
Explorer

I've been stuck for so long, I don't know what to do. Help❤️

 

I have installed Hive using ambari,When I try to insert a piece of data into the table, I get a Tez error,Here's what I've done:

  • hive services and Tez services are normal.
  • The user name I use is root, and encryption-free login has been configured on all nodes.
  • root has its own /user/root directory and /tmp/hive/root on hdfs
  • root is a hive administrator and has all permissions on related databases and tables
  • I use yarn application-list to verify that no tasks are running
  • I use yarn node-list to confirm that all nodes are RUNNING
  • I use hdfs dfsadmin-report to confirm that all nodes are Normal and in normal state
  • I use hadoop fsck confirmed /warehouse/tablespace/managed/hive/root.db/test is healthy

My question is detailed below:

I have a list like this

0: jdbc:hive2://hdp0:2181,hdp1:2181,hdp2:2181> desc test;
INFO  : Compiling command(queryId=hive_20230614090425_2f598338-0603-4ba5-b4a2-8b77796b6d1b): desc test
INFO  : Semantic Analysis Completed (retrial = false)
INFO  : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:col_name, type:string, comment:from deserializer), FieldSchema(name:data_type, type:string, comment:from deserializer), FieldSchema(name:comment, type:string, comment:from deserializer)], properties:null)
INFO  : Completed compiling command(queryId=hive_20230614090425_2f598338-0603-4ba5-b4a2-8b77796b6d1b); Time taken: 0.103 seconds
INFO  : Executing command(queryId=hive_20230614090425_2f598338-0603-4ba5-b4a2-8b77796b6d1b): desc test
INFO  : Starting task [Stage-0:DDL] in serial mode
INFO  : Completed executing command(queryId=hive_20230614090425_2f598338-0603-4ba5-b4a2-8b77796b6d1b); Time taken: 0.012 seconds
INFO  : OK
+-----------+------------+----------+
| col_name  | data_type  | comment  |
+-----------+------------+----------+
| id        | int        |          |
| score     | int        |          |
+-----------+------------+----------+

 When I try to insert data, I get an error

0: jdbc:hive2://hdp0:2181,hdp1:2181,hdp2:2181> insert into table test values (1,1);
INFO  : Compiling command(queryId=hive_20230614090629_4118dca3-cb4d-4ac4-a5e7-28b932278055): insert into table test values (1,1)
INFO  : Semantic Analysis Completed (retrial = false)
INFO  : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:col1, type:int, comment:null), FieldSchema(name:col2, type:int, comment:null)], properties:null)
INFO  : Completed compiling command(queryId=hive_20230614090629_4118dca3-cb4d-4ac4-a5e7-28b932278055); Time taken: 0.263 seconds
INFO  : Executing command(queryId=hive_20230614090629_4118dca3-cb4d-4ac4-a5e7-28b932278055): insert into table test values (1,1)
INFO  : Query ID = hive_20230614090629_4118dca3-cb4d-4ac4-a5e7-28b932278055
INFO  : Total jobs = 1
INFO  : Launching Job 1 out of 1
INFO  : Starting task [Stage-1:MAPRED] in serial mode
WARN  : The session: sessionId=32ed8d7a-8acb-4e68-a4d3-20210f38c38b, queueName=null, user=root, doAs=true, isOpen=false, isDefault=false has not been opened
INFO  : Subscribed to counters: [] for queryId: hive_20230614090629_4118dca3-cb4d-4ac4-a5e7-28b932278055
INFO  : Tez session hasn't been created yet. Opening session
INFO  : Dag name: insert into table test values (1,1) (Stage-1)
INFO  : Dag submit failed due to java.io.IOException: Could not get block locations. Source file "/tmp/hive/root/_tez_session_dir/3816e222-7805-48e5-a726-9ee2d1d84c5d/.tez/application_1686711037964_0022/recovery/1/summary" - Aborting...block==null
        at org.apache.tez.dag.app.DAGAppMaster.startDAG(DAGAppMaster.java:2591)
        at org.apache.tez.dag.app.DAGAppMaster.submitDAGToAppMaster(DAGAppMaster.java:1407)
        at org.apache.tez.dag.api.client.DAGClientHandler.submitDAG(DAGClientHandler.java:143)
        at org.apache.tez.dag.api.client.rpc.DAGClientAMProtocolBlockingPBServerImpl.submitDAG(DAGClientAMProtocolBlockingPBServerImpl.java:184)
        at org.apache.tez.dag.api.client.rpc.DAGClientAMProtocolRPC$DAGClientAMProtocol$2.callBlockingMethod(DAGClientAMProtocolRPC.java:7636)
        at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025)
        at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:876)
        at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:822)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2682)
Caused by: java.io.IOException: Could not get block locations. Source file "/tmp/hive/root/_tez_session_dir/3816e222-7805-48e5-a726-9ee2d1d84c5d/.tez/application_1686711037964_0022/recovery/1/summary" - Aborting...block==null
        at org.apache.hadoop.hdfs.DataStreamer.setupPipelineForAppendOrRecovery(DataStreamer.java:1477)
        at org.apache.hadoop.hdfs.DataStreamer.processDatanodeOrExternalError(DataStreamer.java:1256)
        at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:667)
 stack trace: [sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method), sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62), sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45), java.lang.reflect.Constructor.newInstance(Constructor.java:423), org.apache.tez.common.RPCUtil.instantiateException(RPCUtil.java:53), org.apache.tez.common.RPCUtil.instantiateRuntimeException(RPCUtil.java:85), org.apache.tez.common.RPCUtil.unwrapAndThrowException(RPCUtil.java:135), org.apache.tez.client.TezClient.submitDAGSession(TezClient.java:705), org.apache.tez.client.TezClient.submitDAG(TezClient.java:588), org.apache.hadoop.hive.ql.exec.tez.TezTask.submit(TezTask.java:543), org.apache.hadoop.hive.ql.exec.tez.TezTask.execute(TezTask.java:221), org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:212), org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:103), org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:2712), org.apache.hadoop.hive.ql.Driver.execute(Driver.java:2383), org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:2055), org.apache.hadoop.hive.ql.Driver.run(Driver.java:1753), org.apache.hadoop.hive.ql.Driver.run(Driver.java:1747), org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:157), org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:226), org.apache.hive.service.cli.operation.SQLOperation.access$700(SQLOperation.java:87), org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:324), java.security.AccessController.doPrivileged(Native Method), javax.security.auth.Subject.doAs(Subject.java:422), org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730), org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:342), java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511), java.util.concurrent.FutureTask.run(FutureTask.java:266), java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511), java.util.concurrent.FutureTask.run(FutureTask.java:266), java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149), java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624), java.lang.Thread.run(Thread.java:750)] retrying...
ERROR : Failed to execute tez graph.
java.io.IOException: Could not get block locations. Source file "/tmp/hive/root/_tez_session_dir/3816e222-7805-48e5-a726-9ee2d1d84c5d/.tez/application_1686711037964_0023/tez-conf.pb" - Aborting...block==null
        at org.apache.hadoop.hdfs.DataStreamer.setupPipelineForAppendOrRecovery(DataStreamer.java:1477) ~[hadoop-hdfs-client-3.1.1.3.1.4.0-315.jar:?]
        at org.apache.hadoop.hdfs.DataStreamer.processDatanodeOrExternalError(DataStreamer.java:1256) ~[hadoop-hdfs-client-3.1.1.3.1.4.0-315.jar:?]
        at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:667) ~[hadoop-hdfs-client-3.1.1.3.1.4.0-315.jar:?]
ERROR : FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.tez.TezTask
INFO  : Completed executing command(queryId=hive_20230614090629_4118dca3-cb4d-4ac4-a5e7-28b932278055); Time taken: 5.046 seconds
Error: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.tez.TezTask (state=08S01,code=1)

 

Here's what I've done since the problem arose:

  • I set the /tmp/root directory to 755 with the chmod -R command, but this did not solve the problem
1 ACCEPTED SOLUTION

avatar
Explorer

every body.

I think this is an issue with Docker containerization. I redeployed it using VMware virtual machine today and there were no issues.

View solution in original post

3 REPLIES 3

avatar
Master Collaborator

@xiamu this error could appear if the data nodes are not healthy. Does the job fail repeatedly, or it succeeds at times? Have you tried running it with a different user? 

This is where it is failing:

 

private void setupPipelineForAppendOrRecovery() throws IOException {
    // Check number of datanodes. Note that if there is no healthy datanode,
    // this must be internal error because we mark external error in striped
    // outputstream only when all the streamers are in the DATA_STREAMING stage
    if (nodes == null || nodes.length == 0) {
      String msg = "Could not get block locations. " + "Source file \""
          + src + "\" - Aborting..." + this;
      LOG.warn(msg);
      lastException.set(new IOException(msg));
      streamerClosed = true;
      return;
    }
    setupPipelineInternal(nodes, storageTypes, storageIDs);
  }

 

avatar
Explorer

My data datanodes are all normal.

 

hdfs dfsadmin -report
Configured Capacity: 707412281856 (658.83 GB)
Present Capacity: 592360489158 (551.68 GB)
DFS Remaining: 585697374208 (545.47 GB)
DFS Used: 6663114950 (6.21 GB)
DFS Used%: 1.12%
Replicated Blocks:
        Under replicated blocks: 5
        Blocks with corrupt replicas: 1
        Missing blocks: 0
        Missing blocks (with replication factor 1): 0
        Low redundancy blocks with highest priority to recover: 0
        Pending deletion blocks: 2
Erasure Coded Block Groups:
        Low redundancy block groups: 0
        Block groups with corrupt internal blocks: 0
        Missing block groups: 0
        Low redundancy blocks with highest priority to recover: 0
        Pending deletion blocks: 0

-------------------------------------------------
Live datanodes (3):

Name: 172.19.0.3:50010 (bgs1)
Hostname: bgs1
Decommission Status : Normal
Configured Capacity: 235804093952 (219.61 GB)
DFS Used: 2220986368 (2.07 GB)
Non DFS Used: 24545226240 (22.86 GB)
DFS Remaining: 195142991872 (181.74 GB)
DFS Used%: 0.94%
DFS Remaining%: 82.76%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 2
Last contact: Sat Jun 17 04:27:19 UTC 2023
Last Block Report: Sat Jun 17 04:17:01 UTC 2023
Num of Blocks: 130


Name: 172.19.0.4:50010 (bgs2)
Hostname: bgs2
Decommission Status : Normal
Configured Capacity: 235804093952 (219.61 GB)
DFS Used: 2220888064 (2.07 GB)
Non DFS Used: 24545361408 (22.86 GB)
DFS Remaining: 195277172736 (181.87 GB)
DFS Used%: 0.94%
DFS Remaining%: 82.81%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 2
Last contact: Sat Jun 17 04:27:20 UTC 2023
Last Block Report: Sat Jun 17 04:16:28 UTC 2023
Num of Blocks: 130


Name: 172.19.0.5:50010 (bgs3)
Hostname: bgs3
Decommission Status : Normal
Configured Capacity: 235804093952 (219.61 GB)
DFS Used: 2221240518 (2.07 GB)
Non DFS Used: 24544972090 (22.86 GB)
DFS Remaining: 195277209600 (181.87 GB)
DFS Used%: 0.94%
DFS Remaining%: 82.81%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 2
Last contact: Sat Jun 17 04:27:19 UTC 2023
Last Block Report: Sat Jun 17 04:17:38 UTC 2023
Num of Blocks: 130

 

 

The same error happens when I use different users.

 

It might be worth mentioning that I used docker containers to simulate the cluster. When I continued to run the test program, I found that the HDFS test program also reported errors, so I thought maybe the error of Tez was due to the error of HDFS.

Traceback (most recent call last):
  File "/var/lib/ambari-agent/cache/stacks/HDP/3.0/services/HDFS/package/scripts/service_check.py", line 167, in <module>
    HdfsServiceCheck().execute()
  File "/usr/lib/ambari-agent/lib/resource_management/libraries/script/script.py", line 352, in execute
    method(env)
  File "/var/lib/ambari-agent/cache/stacks/HDP/3.0/services/HDFS/package/scripts/service_check.py", line 88, in service_check
    action="create_on_execute"
  File "/usr/lib/ambari-agent/lib/resource_management/core/base.py", line 166, in __init__
    self.env.run()
  File "/usr/lib/ambari-agent/lib/resource_management/core/environment.py", line 160, in run
    self.run_action(resource, action)
  File "/usr/lib/ambari-agent/lib/resource_management/core/environment.py", line 124, in run_action
    provider_action()
  File "/usr/lib/ambari-agent/lib/resource_management/libraries/providers/hdfs_resource.py", line 677, in action_create_on_execute
    self.action_delayed("create")
  File "/usr/lib/ambari-agent/lib/resource_management/libraries/providers/hdfs_resource.py", line 674, in action_delayed
    self.get_hdfs_resource_executor().action_delayed(action_name, self)
  File "/usr/lib/ambari-agent/lib/resource_management/libraries/providers/hdfs_resource.py", line 373, in action_delayed
    self.action_delayed_for_nameservice(None, action_name, main_resource)
  File "/usr/lib/ambari-agent/lib/resource_management/libraries/providers/hdfs_resource.py", line 403, in action_delayed_for_nameservice
    self._create_resource()
  File "/usr/lib/ambari-agent/lib/resource_management/libraries/providers/hdfs_resource.py", line 419, in _create_resource
    self._create_file(self.main_resource.resource.target, source=self.main_resource.resource.source, mode=self.mode)
  File "/usr/lib/ambari-agent/lib/resource_management/libraries/providers/hdfs_resource.py", line 534, in _create_file
    self.util.run_command(target, 'CREATE', method='PUT', overwrite=True, assertable_result=False, file_to_put=source, **kwargs)
  File "/usr/lib/ambari-agent/lib/resource_management/libraries/providers/hdfs_resource.py", line 214, in run_command
    return self._run_command(*args, **kwargs)
  File "/usr/lib/ambari-agent/lib/resource_management/libraries/providers/hdfs_resource.py", line 295, in _run_command
    raise WebHDFSCallException(err_msg, result_dict)
resource_management.libraries.providers.hdfs_resource.WebHDFSCallException: Execution of 'curl -sS -L -w '%{http_code}' -X PUT --data-binary @/var/lib/ambari-agent/tmp/hdfs-service-check -H 'Content-Type: application/octet-stream' 'http://bgm:50070/webhdfs/v1/tmp/id13ac0200_date521723?op=CREATE&user.name=hdfs&overwrite=True'' returned status_code=403. 
{
  "RemoteException": {
    "exception": "IOException", 
    "javaClassName": "java.io.IOException", 
    "message": "File /tmp/id13ac0200_date521723 could only be written to 0 of the 1 minReplication nodes. There are 3 datanode(s) running and 3 node(s) are excluded in this operation.\n\tat org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:2121)\n\tat org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.chooseTargetForNewBlock(FSDirWriteFileOp.java:286)\n\tat org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2706)\n\tat org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:875)\n\tat org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:561)\n\tat org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)\n\tat org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524)\n\tat org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025)\n\tat org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:876)\n\tat org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:822)\n\tat java.security.AccessController.doPrivileged(Native Method)\n\tat javax.security.auth.Subject.doAs(Subject.java:422)\n\tat org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)\n\tat org.apache.hadoop.ipc.Server$Handler.run(Server.java:2682)\n"
  }
}

There are 3 datanode(s) running and 3 node(s) are excluded in this operation. Why?

avatar
Explorer

every body.

I think this is an issue with Docker containerization. I redeployed it using VMware virtual machine today and there were no issues.