Created 06-14-2023 02:22 AM
I've been stuck for so long, I don't know what to do. Help❤️
I have installed Hive using ambari,When I try to insert a piece of data into the table, I get a Tez error,Here's what I've done:
My question is detailed below:
I have a list like this
0: jdbc:hive2://hdp0:2181,hdp1:2181,hdp2:2181> desc test;
INFO : Compiling command(queryId=hive_20230614090425_2f598338-0603-4ba5-b4a2-8b77796b6d1b): desc test
INFO : Semantic Analysis Completed (retrial = false)
INFO : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:col_name, type:string, comment:from deserializer), FieldSchema(name:data_type, type:string, comment:from deserializer), FieldSchema(name:comment, type:string, comment:from deserializer)], properties:null)
INFO : Completed compiling command(queryId=hive_20230614090425_2f598338-0603-4ba5-b4a2-8b77796b6d1b); Time taken: 0.103 seconds
INFO : Executing command(queryId=hive_20230614090425_2f598338-0603-4ba5-b4a2-8b77796b6d1b): desc test
INFO : Starting task [Stage-0:DDL] in serial mode
INFO : Completed executing command(queryId=hive_20230614090425_2f598338-0603-4ba5-b4a2-8b77796b6d1b); Time taken: 0.012 seconds
INFO : OK
+-----------+------------+----------+
| col_name | data_type | comment |
+-----------+------------+----------+
| id | int | |
| score | int | |
+-----------+------------+----------+
When I try to insert data, I get an error
0: jdbc:hive2://hdp0:2181,hdp1:2181,hdp2:2181> insert into table test values (1,1);
INFO : Compiling command(queryId=hive_20230614090629_4118dca3-cb4d-4ac4-a5e7-28b932278055): insert into table test values (1,1)
INFO : Semantic Analysis Completed (retrial = false)
INFO : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:col1, type:int, comment:null), FieldSchema(name:col2, type:int, comment:null)], properties:null)
INFO : Completed compiling command(queryId=hive_20230614090629_4118dca3-cb4d-4ac4-a5e7-28b932278055); Time taken: 0.263 seconds
INFO : Executing command(queryId=hive_20230614090629_4118dca3-cb4d-4ac4-a5e7-28b932278055): insert into table test values (1,1)
INFO : Query ID = hive_20230614090629_4118dca3-cb4d-4ac4-a5e7-28b932278055
INFO : Total jobs = 1
INFO : Launching Job 1 out of 1
INFO : Starting task [Stage-1:MAPRED] in serial mode
WARN : The session: sessionId=32ed8d7a-8acb-4e68-a4d3-20210f38c38b, queueName=null, user=root, doAs=true, isOpen=false, isDefault=false has not been opened
INFO : Subscribed to counters: [] for queryId: hive_20230614090629_4118dca3-cb4d-4ac4-a5e7-28b932278055
INFO : Tez session hasn't been created yet. Opening session
INFO : Dag name: insert into table test values (1,1) (Stage-1)
INFO : Dag submit failed due to java.io.IOException: Could not get block locations. Source file "/tmp/hive/root/_tez_session_dir/3816e222-7805-48e5-a726-9ee2d1d84c5d/.tez/application_1686711037964_0022/recovery/1/summary" - Aborting...block==null
at org.apache.tez.dag.app.DAGAppMaster.startDAG(DAGAppMaster.java:2591)
at org.apache.tez.dag.app.DAGAppMaster.submitDAGToAppMaster(DAGAppMaster.java:1407)
at org.apache.tez.dag.api.client.DAGClientHandler.submitDAG(DAGClientHandler.java:143)
at org.apache.tez.dag.api.client.rpc.DAGClientAMProtocolBlockingPBServerImpl.submitDAG(DAGClientAMProtocolBlockingPBServerImpl.java:184)
at org.apache.tez.dag.api.client.rpc.DAGClientAMProtocolRPC$DAGClientAMProtocol$2.callBlockingMethod(DAGClientAMProtocolRPC.java:7636)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:876)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:822)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2682)
Caused by: java.io.IOException: Could not get block locations. Source file "/tmp/hive/root/_tez_session_dir/3816e222-7805-48e5-a726-9ee2d1d84c5d/.tez/application_1686711037964_0022/recovery/1/summary" - Aborting...block==null
at org.apache.hadoop.hdfs.DataStreamer.setupPipelineForAppendOrRecovery(DataStreamer.java:1477)
at org.apache.hadoop.hdfs.DataStreamer.processDatanodeOrExternalError(DataStreamer.java:1256)
at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:667)
stack trace: [sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method), sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62), sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45), java.lang.reflect.Constructor.newInstance(Constructor.java:423), org.apache.tez.common.RPCUtil.instantiateException(RPCUtil.java:53), org.apache.tez.common.RPCUtil.instantiateRuntimeException(RPCUtil.java:85), org.apache.tez.common.RPCUtil.unwrapAndThrowException(RPCUtil.java:135), org.apache.tez.client.TezClient.submitDAGSession(TezClient.java:705), org.apache.tez.client.TezClient.submitDAG(TezClient.java:588), org.apache.hadoop.hive.ql.exec.tez.TezTask.submit(TezTask.java:543), org.apache.hadoop.hive.ql.exec.tez.TezTask.execute(TezTask.java:221), org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:212), org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:103), org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:2712), org.apache.hadoop.hive.ql.Driver.execute(Driver.java:2383), org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:2055), org.apache.hadoop.hive.ql.Driver.run(Driver.java:1753), org.apache.hadoop.hive.ql.Driver.run(Driver.java:1747), org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:157), org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:226), org.apache.hive.service.cli.operation.SQLOperation.access$700(SQLOperation.java:87), org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:324), java.security.AccessController.doPrivileged(Native Method), javax.security.auth.Subject.doAs(Subject.java:422), org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730), org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:342), java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511), java.util.concurrent.FutureTask.run(FutureTask.java:266), java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511), java.util.concurrent.FutureTask.run(FutureTask.java:266), java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149), java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624), java.lang.Thread.run(Thread.java:750)] retrying...
ERROR : Failed to execute tez graph.
java.io.IOException: Could not get block locations. Source file "/tmp/hive/root/_tez_session_dir/3816e222-7805-48e5-a726-9ee2d1d84c5d/.tez/application_1686711037964_0023/tez-conf.pb" - Aborting...block==null
at org.apache.hadoop.hdfs.DataStreamer.setupPipelineForAppendOrRecovery(DataStreamer.java:1477) ~[hadoop-hdfs-client-3.1.1.3.1.4.0-315.jar:?]
at org.apache.hadoop.hdfs.DataStreamer.processDatanodeOrExternalError(DataStreamer.java:1256) ~[hadoop-hdfs-client-3.1.1.3.1.4.0-315.jar:?]
at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:667) ~[hadoop-hdfs-client-3.1.1.3.1.4.0-315.jar:?]
ERROR : FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.tez.TezTask
INFO : Completed executing command(queryId=hive_20230614090629_4118dca3-cb4d-4ac4-a5e7-28b932278055); Time taken: 5.046 seconds
Error: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.tez.TezTask (state=08S01,code=1)
Here's what I've done since the problem arose:
Created 06-18-2023 06:59 AM
every body.
I think this is an issue with Docker containerization. I redeployed it using VMware virtual machine today and there were no issues.
Created on 06-14-2023 10:47 PM - edited 06-15-2023 12:06 AM
@xiamu this error could appear if the data nodes are not healthy. Does the job fail repeatedly, or it succeeds at times? Have you tried running it with a different user?
This is where it is failing:
private void setupPipelineForAppendOrRecovery() throws IOException {
// Check number of datanodes. Note that if there is no healthy datanode,
// this must be internal error because we mark external error in striped
// outputstream only when all the streamers are in the DATA_STREAMING stage
if (nodes == null || nodes.length == 0) {
String msg = "Could not get block locations. " + "Source file \""
+ src + "\" - Aborting..." + this;
LOG.warn(msg);
lastException.set(new IOException(msg));
streamerClosed = true;
return;
}
setupPipelineInternal(nodes, storageTypes, storageIDs);
}
Created on 06-16-2023 09:55 PM - edited 06-16-2023 09:57 PM
My data datanodes are all normal.
hdfs dfsadmin -report
Configured Capacity: 707412281856 (658.83 GB)
Present Capacity: 592360489158 (551.68 GB)
DFS Remaining: 585697374208 (545.47 GB)
DFS Used: 6663114950 (6.21 GB)
DFS Used%: 1.12%
Replicated Blocks:
Under replicated blocks: 5
Blocks with corrupt replicas: 1
Missing blocks: 0
Missing blocks (with replication factor 1): 0
Low redundancy blocks with highest priority to recover: 0
Pending deletion blocks: 2
Erasure Coded Block Groups:
Low redundancy block groups: 0
Block groups with corrupt internal blocks: 0
Missing block groups: 0
Low redundancy blocks with highest priority to recover: 0
Pending deletion blocks: 0
-------------------------------------------------
Live datanodes (3):
Name: 172.19.0.3:50010 (bgs1)
Hostname: bgs1
Decommission Status : Normal
Configured Capacity: 235804093952 (219.61 GB)
DFS Used: 2220986368 (2.07 GB)
Non DFS Used: 24545226240 (22.86 GB)
DFS Remaining: 195142991872 (181.74 GB)
DFS Used%: 0.94%
DFS Remaining%: 82.76%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 2
Last contact: Sat Jun 17 04:27:19 UTC 2023
Last Block Report: Sat Jun 17 04:17:01 UTC 2023
Num of Blocks: 130
Name: 172.19.0.4:50010 (bgs2)
Hostname: bgs2
Decommission Status : Normal
Configured Capacity: 235804093952 (219.61 GB)
DFS Used: 2220888064 (2.07 GB)
Non DFS Used: 24545361408 (22.86 GB)
DFS Remaining: 195277172736 (181.87 GB)
DFS Used%: 0.94%
DFS Remaining%: 82.81%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 2
Last contact: Sat Jun 17 04:27:20 UTC 2023
Last Block Report: Sat Jun 17 04:16:28 UTC 2023
Num of Blocks: 130
Name: 172.19.0.5:50010 (bgs3)
Hostname: bgs3
Decommission Status : Normal
Configured Capacity: 235804093952 (219.61 GB)
DFS Used: 2221240518 (2.07 GB)
Non DFS Used: 24544972090 (22.86 GB)
DFS Remaining: 195277209600 (181.87 GB)
DFS Used%: 0.94%
DFS Remaining%: 82.81%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 2
Last contact: Sat Jun 17 04:27:19 UTC 2023
Last Block Report: Sat Jun 17 04:17:38 UTC 2023
Num of Blocks: 130
The same error happens when I use different users.
It might be worth mentioning that I used docker containers to simulate the cluster. When I continued to run the test program, I found that the HDFS test program also reported errors, so I thought maybe the error of Tez was due to the error of HDFS.
Traceback (most recent call last): File "/var/lib/ambari-agent/cache/stacks/HDP/3.0/services/HDFS/package/scripts/service_check.py", line 167, in <module> HdfsServiceCheck().execute() File "/usr/lib/ambari-agent/lib/resource_management/libraries/script/script.py", line 352, in execute method(env) File "/var/lib/ambari-agent/cache/stacks/HDP/3.0/services/HDFS/package/scripts/service_check.py", line 88, in service_check action="create_on_execute" File "/usr/lib/ambari-agent/lib/resource_management/core/base.py", line 166, in __init__ self.env.run() File "/usr/lib/ambari-agent/lib/resource_management/core/environment.py", line 160, in run self.run_action(resource, action) File "/usr/lib/ambari-agent/lib/resource_management/core/environment.py", line 124, in run_action provider_action() File "/usr/lib/ambari-agent/lib/resource_management/libraries/providers/hdfs_resource.py", line 677, in action_create_on_execute self.action_delayed("create") File "/usr/lib/ambari-agent/lib/resource_management/libraries/providers/hdfs_resource.py", line 674, in action_delayed self.get_hdfs_resource_executor().action_delayed(action_name, self) File "/usr/lib/ambari-agent/lib/resource_management/libraries/providers/hdfs_resource.py", line 373, in action_delayed self.action_delayed_for_nameservice(None, action_name, main_resource) File "/usr/lib/ambari-agent/lib/resource_management/libraries/providers/hdfs_resource.py", line 403, in action_delayed_for_nameservice self._create_resource() File "/usr/lib/ambari-agent/lib/resource_management/libraries/providers/hdfs_resource.py", line 419, in _create_resource self._create_file(self.main_resource.resource.target, source=self.main_resource.resource.source, mode=self.mode) File "/usr/lib/ambari-agent/lib/resource_management/libraries/providers/hdfs_resource.py", line 534, in _create_file self.util.run_command(target, 'CREATE', method='PUT', overwrite=True, assertable_result=False, file_to_put=source, **kwargs) File "/usr/lib/ambari-agent/lib/resource_management/libraries/providers/hdfs_resource.py", line 214, in run_command return self._run_command(*args, **kwargs) File "/usr/lib/ambari-agent/lib/resource_management/libraries/providers/hdfs_resource.py", line 295, in _run_command raise WebHDFSCallException(err_msg, result_dict) resource_management.libraries.providers.hdfs_resource.WebHDFSCallException: Execution of 'curl -sS -L -w '%{http_code}' -X PUT --data-binary @/var/lib/ambari-agent/tmp/hdfs-service-check -H 'Content-Type: application/octet-stream' 'http://bgm:50070/webhdfs/v1/tmp/id13ac0200_date521723?op=CREATE&user.name=hdfs&overwrite=True'' returned status_code=403. { "RemoteException": { "exception": "IOException", "javaClassName": "java.io.IOException", "message": "File /tmp/id13ac0200_date521723 could only be written to 0 of the 1 minReplication nodes. There are 3 datanode(s) running and 3 node(s) are excluded in this operation.\n\tat org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:2121)\n\tat org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.chooseTargetForNewBlock(FSDirWriteFileOp.java:286)\n\tat org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2706)\n\tat org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:875)\n\tat org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:561)\n\tat org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)\n\tat org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524)\n\tat org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025)\n\tat org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:876)\n\tat org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:822)\n\tat java.security.AccessController.doPrivileged(Native Method)\n\tat javax.security.auth.Subject.doAs(Subject.java:422)\n\tat org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)\n\tat org.apache.hadoop.ipc.Server$Handler.run(Server.java:2682)\n" } }
There are 3 datanode(s) running and 3 node(s) are excluded in this operation. Why?
Created 06-18-2023 06:59 AM
every body.
I think this is an issue with Docker containerization. I redeployed it using VMware virtual machine today and there were no issues.