Support Questions

Find answers, ask questions, and share your expertise

PolyBase and Cloudera-Error: File could only be replicated to 0 nodes instead of minReplication (=1)

avatar
Contributor

Hello everyone,

 

we want to connect our SQL Server 2016 Enterprise via Polybase with our Kerberized OnPrem Hadoop-Cluster with Cloudera 5.14. I followed the Microsoft Polybase Guide to configure Polybase and I was successful with all four checkpoints. Unfortunately we are not able to export tables from SQL-Server to our Hadoop-Cluster.

Short information about the four checkpoints from Polybase guide:

  • Checkpoint 1: Authenticated against the KDC and received a TGT
  • Checkpoint 2: Regarding troubleshooting guide PolyBase will make an attempt to access the HDFS and fail because the request did not contain the necessary Service Ticket.
  • Checkpoint 3: A second hex dump indicates that SQL Server successfully used the TGT and acquired the applicable Service Ticket for the name node's SPN from the KDC.
  • Checkpoint 4: SQL Server was authenticated by Hadoop using the ST (Service Ticket) and a session was granted to access the secured resource.

Uploading a local file to our HDFS works great but exporting a small table from SQL-Server to HDFS throws following exception.

Exception from primary NameNode:

IPC Server handler 22 on 8020, call org.apache.hadoop.hdfs.protocol.ClientProtocol.addBlock from 10.100.160.13:53900 Call#805 Retry#0
java.io.IOException: File /PolybaseTest/QID2585_20180706_150246_1.parq.gz could only be replicated to 0 nodes instead of minReplication (=1).  There are 3 datanode(s) running and 3 node(s) are excluded in this operation.
	at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:1724)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:3448)
	at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:690)
	at org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.addBlock(AuthorizationProviderProxyClientProtocol.java:217)
	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:506)
	at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
	at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1073)
	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2281)
	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2277)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1917)
	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2275)

On our SQL-Server we get almost the same exception.

Exception from SQL-Server:

Cannot execute the query "Remote Query" against OLE DB provider "SQLNCLI11" for linked server "SQLNCLI11". 110802;An internal DMS error occurred that caused this operation to fail. Details: Exception: Microsoft.SqlServer.DataWarehouse.DataMovement.Common.ExternalAccess.HdfsAccessException, Message: Java exception raised on call to HdfsBridge_DestroyRecordWriter: Error [File /PolybaseTest/QID2585_20180706_150246_7.parq.gz could only be replicated to 0 nodes instead of minReplication (=1).  There are 3 datanode(s) running and 3 node(s) are excluded in this operation.
	at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:1724)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:3448)
	at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:690)
	at org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.addBlock(AuthorizationProviderProxyClientProtocol.java:217)
	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:506)
	at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
	at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1073)
	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2281)
	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2277)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1917)
	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2275)
] occurred while accessing external file.

 

I appreciate any help!

1 ACCEPTED SOLUTION

avatar
Contributor

After opening ports 1004 and 1006 we are now able to write data into our Cluster.

Many thanks to weichiu, without his hint regarding enabling debug log for class org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy and org.apache.hadoop.net.NetworkTopology I would not be able to see the problem.

View solution in original post

6 REPLIES 6

avatar
Expert Contributor

Block placement is a very complex algorithm. I would suggest enable debug log for class org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy and org.apache.hadoop.net.NetworkTopology on the NameNode. (Or just enable NameNode debug log level) The debug log should given an explanation as to why it couldn't choose the DataNodes to write.

avatar
Contributor

Hello weichiu,

 

thank you very much for your reply!

Below you can find the logs for Block placement and Network topology.

 

 

1:21:28.874 AM	INFO	Server				Auth successful for hdfs@MYCOMPANY.REALM.COM (auth:KERBEROS)
1:21:28.876 AM	INFO	ServiceAuthorizationManager	Authorization successful for hdfs@MYCOMPANY.REALM.COM (auth:KERBEROS) for protocol=interface org.apache.hadoop.hdfs.protocol.ClientProtocol
1:21:28.888 AM	DEBUG	NetworkTopology			Choosing random from 3 available nodes on node /default, scope=/default, excludedScope=null, excludeNodes=[]
1:21:28.888 AM	DEBUG	NetworkTopology			chooseRandom returning X.X.X.45:1004
1:21:28.888 AM	DEBUG	NetworkTopology			Choosing random from 3 available nodes on node /default, scope=/default, excludedScope=null, excludeNodes=[]
1:21:28.888 AM	DEBUG	NetworkTopology			Failed to find datanode (scope="" excludedScope="/default").
1:21:28.888 AM	DEBUG	NetworkTopology			chooseRandom returning X.X.X.43:1004
1:21:28.888 AM	DEBUG	BlockPlacementPolicy		Failed to choose remote rack (location = ~/default), fallback to local rack
							org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy$NotEnoughReplicasException: 
							   at org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseRandom(BlockPlacementPolicyDefault.java:746)
							   .
							   .
							   .
1:21:28.889 AM	DEBUG	NetworkTopology			Failed to find datanode (scope="" excludedScope="/default").
1:21:28.889 AM	DEBUG	NetworkTopology			Choosing random from 2 available nodes on node /default, scope=/default, excludedScope=null, excludeNodes=[X.X.X.45:1004]
1:21:28.889 AM	DEBUG	BlockPlacementPolicy		Failed to choose remote rack (location = ~/default), fallback to local rack
							org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy$NotEnoughReplicasException: 
							   at org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseRandom(BlockPlacementPolicyDefault.java:746)
							   .
							   .
							   .
1:21:28.890 AM	DEBUG	NetworkTopology			Choosing random from 2 available nodes on node /default, scope=/default, excludedScope=null, excludeNodes=[X.X.X.43:1004]
1:21:28.889 AM	DEBUG	NetworkTopology			chooseRandom returning X.X.X.44:1004
1:21:28.890 AM	DEBUG	NetworkTopology			Node X.X.X.43:1004 is excluded, continuing.
1:21:28.890 AM	DEBUG	NetworkTopology			Node X.X.X.43:1004 is excluded, continuing.
1:21:28.890 AM	DEBUG	NetworkTopology			Node X.X.X.43:1004 is excluded, continuing.
1:21:28.890 AM	DEBUG	NetworkTopology			chooseRandom returning X.X.X.44:1004
1:21:28.890 AM	DEBUG	NetworkTopology			Failed to find datanode (scope="" excludedScope="/default").
1:21:28.890 AM	DEBUG	BlockPlacementPolicy		Failed to choose remote rack (location = ~/default), fallback to local rack
							org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy$NotEnoughReplicasException: 
							   at org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseRandom(BlockPlacementPolicyDefault.java:746)
							   .
							   .
							   .
1:21:28.890 AM	DEBUG	NetworkTopology			Failed to find datanode (scope="" excludedScope="/default").
1:21:28.890 AM	DEBUG	BlockPlacementPolicy		Failed to choose remote rack (location = ~/default), fallback to local rack
							org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy$NotEnoughReplicasException: 
							   at org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseRandom(BlockPlacementPolicyDefault.java:746)
							   .
							   .
							   .														
1:21:28.890 AM	DEBUG	NetworkTopology			Choosing random from 1 available nodes on node /default, scope=/default, excludedScope=null, excludeNodes=[X.X.X.45:1004, X.X.X.44:1004]
1:21:28.890 AM	DEBUG	NetworkTopology			Choosing random from 1 available nodes on node /default, scope=/default, excludedScope=null, excludeNodes=[X.X.X.43:1004, X.X.X.44:1004]
1:21:28.890 AM	DEBUG	NetworkTopology			Node X.X.X.44:1004 is excluded, continuing.
1:21:28.890 AM	DEBUG	NetworkTopology			Node X.X.X.43:1004 is excluded, continuing.
1:21:28.891 AM	DEBUG	NetworkTopology			Node X.X.X.44:1004 is excluded, continuing.
1:21:28.891 AM	DEBUG	NetworkTopology			Node X.X.X.43:1004 is excluded, continuing.
1:21:28.891 AM	DEBUG	NetworkTopology			Node X.X.X.45:1004 is excluded, continuing.
1:21:28.891 AM	DEBUG	NetworkTopology			chooseRandom returning X.X.X.45:1004
1:21:28.891 AM	DEBUG	NetworkTopology			Node X.X.X.45:1004 is excluded, continuing.
1:21:28.891 AM	DEBUG	NetworkTopology			chooseRandom returning X.X.X.43:1004
1:21:28.891 AM	INFO	StateChange			BLOCK* allocateBlock: /PolybaseTest/QID2601_20180707_12128_3.parq.gz. BP-1767765873-X.X.X.41-1525850808562 blk_1073840961_100142{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[[DISK]DS-d069f8ba-9a4e-4b64-863d-9b818b27d298:NORMAL:X.X.X.43:1004|RBW], ReplicaUnderConstruction[[DISK]DS-e30cd499-5230-4e68-a6f7-4517e8f5b367:NORMAL:X.X.X.44:1004|RBW], ReplicaUnderConstruction[[DISK]DS-246055b9-1252-4d70-8b4a-6406346da99f:NORMAL:X.X.X.45:1004|RBW]]}
1:21:28.891 AM	INFO	StateChange			BLOCK* allocateBlock: /PolybaseTest/QID2601_20180707_12128_4.parq.gz. BP-1767765873-X.X.X.41-1525850808562 blk_1073840962_100143{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[[DISK]DS-246055b9-1252-4d70-8b4a-6406346da99f:NORMAL:X.X.X.45:1004|RBW], ReplicaUnderConstruction[[DISK]DS-e30cd499-5230-4e68-a6f7-4517e8f5b367:NORMAL:X.X.X.44:1004|RBW], ReplicaUnderConstruction[[DISK]DS-d069f8ba-9a4e-4b64-863d-9b818b27d298:NORMAL:X.X.X.43:1004|RBW]]}
1:21:28.891 AM	DEBUG	NetworkTopology			Choosing random from 3 available nodes on node /default, scope=/default, excludedScope=null, excludeNodes=[]
1:21:28.892 AM	DEBUG	NetworkTopology			chooseRandom returning X.X.X.44:1004
1:21:28.892 AM	DEBUG	NetworkTopology			Failed to find datanode (scope="" excludedScope="/default").
1:21:28.892 AM	DEBUG	BlockPlacementPolicy		Failed to choose remote rack (location = ~/default), fallback to local rack
							org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy$NotEnoughReplicasException: 
							   at org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseRandom(BlockPlacementPolicyDefault.java:746)
							   .
							   .
							   .			

 

I could see from logs two important points for me:

  1. Somehow all three DataNodes on our cluster are mentioned with the key word "excludeNodes". This looks strange for me.
  2. Regarding CDH Port list, port 1004 is used for secure communication. I have to say that port 1004 is not reachable at the moment from our SQL-Server (Polybase).

Many thanks in advance.

 

Baris

 

avatar
Contributor

What we will do so far is now that we will open the Ports 1004 and 1006 for secure communication.

Can someone tell us please if the logs from our NameNode looks correct or strange?

We would need more assistance regarding logs.

 

Many thanks.

Baris

avatar
Contributor

After opening ports 1004 and 1006 we are now able to write data into our Cluster.

Many thanks to weichiu, without his hint regarding enabling debug log for class org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy and org.apache.hadoop.net.NetworkTopology I would not be able to see the problem.

avatar
Explorer

While reading and writing from HDFS getting bellow errors from Java prg side

 

Exception in thread "main" org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /user/hdfs/test11/tutorials11.txt could only be replicated to 0 nodes instead of minReplication (=1). There are 3 datanode(s) running and 3 node(s) are excluded in this operation.

avatar
Mentor
Have you followed the solution made above? Depending on where you are trying to write into your cluster, unless you have full access to communicating with all your DataNode hosts and its ports, you will face this error.