- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
PolyBase and Cloudera-Error: File could only be replicated to 0 nodes instead of minReplication (=1)
- Labels:
-
HDFS
Created ‎07-06-2018 06:42 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello everyone,
we want to connect our SQL Server 2016 Enterprise via Polybase with our Kerberized OnPrem Hadoop-Cluster with Cloudera 5.14. I followed the Microsoft Polybase Guide to configure Polybase and I was successful with all four checkpoints. Unfortunately we are not able to export tables from SQL-Server to our Hadoop-Cluster.
Short information about the four checkpoints from Polybase guide:
- Checkpoint 1: Authenticated against the KDC and received a TGT
- Checkpoint 2: Regarding troubleshooting guide PolyBase will make an attempt to access the HDFS and fail because the request did not contain the necessary Service Ticket.
- Checkpoint 3: A second hex dump indicates that SQL Server successfully used the TGT and acquired the applicable Service Ticket for the name node's SPN from the KDC.
- Checkpoint 4: SQL Server was authenticated by Hadoop using the ST (Service Ticket) and a session was granted to access the secured resource.
Uploading a local file to our HDFS works great but exporting a small table from SQL-Server to HDFS throws following exception.
Exception from primary NameNode:
IPC Server handler 22 on 8020, call org.apache.hadoop.hdfs.protocol.ClientProtocol.addBlock from 10.100.160.13:53900 Call#805 Retry#0 java.io.IOException: File /PolybaseTest/QID2585_20180706_150246_1.parq.gz could only be replicated to 0 nodes instead of minReplication (=1). There are 3 datanode(s) running and 3 node(s) are excluded in this operation. at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:1724) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:3448) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:690) at org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.addBlock(AuthorizationProviderProxyClientProtocol.java:217) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:506) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1073) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2281) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2277) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1917) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2275)
On our SQL-Server we get almost the same exception.
Exception from SQL-Server:
Cannot execute the query "Remote Query" against OLE DB provider "SQLNCLI11" for linked server "SQLNCLI11". 110802;An internal DMS error occurred that caused this operation to fail. Details: Exception: Microsoft.SqlServer.DataWarehouse.DataMovement.Common.ExternalAccess.HdfsAccessException, Message: Java exception raised on call to HdfsBridge_DestroyRecordWriter: Error [File /PolybaseTest/QID2585_20180706_150246_7.parq.gz could only be replicated to 0 nodes instead of minReplication (=1). There are 3 datanode(s) running and 3 node(s) are excluded in this operation. at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:1724) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:3448) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:690) at org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.addBlock(AuthorizationProviderProxyClientProtocol.java:217) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:506) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1073) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2281) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2277) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1917) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2275) ] occurred while accessing external file.
I appreciate any help!
Created ‎07-18-2018 12:26 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
After opening ports 1004 and 1006 we are now able to write data into our Cluster.
Many thanks to weichiu, without his hint regarding enabling debug log for class org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy and org.apache.hadoop.net.NetworkTopology I would not be able to see the problem.
Created ‎07-06-2018 01:50 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Block placement is a very complex algorithm. I would suggest enable debug log for class org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy and org.apache.hadoop.net.NetworkTopology on the NameNode. (Or just enable NameNode debug log level) The debug log should given an explanation as to why it couldn't choose the DataNodes to write.
Created ‎07-06-2018 05:48 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello weichiu,
thank you very much for your reply!
Below you can find the logs for Block placement and Network topology.
1:21:28.874 AM INFO Server Auth successful for hdfs@MYCOMPANY.REALM.COM (auth:KERBEROS) 1:21:28.876 AM INFO ServiceAuthorizationManager Authorization successful for hdfs@MYCOMPANY.REALM.COM (auth:KERBEROS) for protocol=interface org.apache.hadoop.hdfs.protocol.ClientProtocol 1:21:28.888 AM DEBUG NetworkTopology Choosing random from 3 available nodes on node /default, scope=/default, excludedScope=null, excludeNodes=[] 1:21:28.888 AM DEBUG NetworkTopology chooseRandom returning X.X.X.45:1004 1:21:28.888 AM DEBUG NetworkTopology Choosing random from 3 available nodes on node /default, scope=/default, excludedScope=null, excludeNodes=[] 1:21:28.888 AM DEBUG NetworkTopology Failed to find datanode (scope="" excludedScope="/default"). 1:21:28.888 AM DEBUG NetworkTopology chooseRandom returning X.X.X.43:1004 1:21:28.888 AM DEBUG BlockPlacementPolicy Failed to choose remote rack (location = ~/default), fallback to local rack org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy$NotEnoughReplicasException: at org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseRandom(BlockPlacementPolicyDefault.java:746) . . . 1:21:28.889 AM DEBUG NetworkTopology Failed to find datanode (scope="" excludedScope="/default"). 1:21:28.889 AM DEBUG NetworkTopology Choosing random from 2 available nodes on node /default, scope=/default, excludedScope=null, excludeNodes=[X.X.X.45:1004] 1:21:28.889 AM DEBUG BlockPlacementPolicy Failed to choose remote rack (location = ~/default), fallback to local rack org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy$NotEnoughReplicasException: at org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseRandom(BlockPlacementPolicyDefault.java:746) . . . 1:21:28.890 AM DEBUG NetworkTopology Choosing random from 2 available nodes on node /default, scope=/default, excludedScope=null, excludeNodes=[X.X.X.43:1004] 1:21:28.889 AM DEBUG NetworkTopology chooseRandom returning X.X.X.44:1004 1:21:28.890 AM DEBUG NetworkTopology Node X.X.X.43:1004 is excluded, continuing. 1:21:28.890 AM DEBUG NetworkTopology Node X.X.X.43:1004 is excluded, continuing. 1:21:28.890 AM DEBUG NetworkTopology Node X.X.X.43:1004 is excluded, continuing. 1:21:28.890 AM DEBUG NetworkTopology chooseRandom returning X.X.X.44:1004 1:21:28.890 AM DEBUG NetworkTopology Failed to find datanode (scope="" excludedScope="/default"). 1:21:28.890 AM DEBUG BlockPlacementPolicy Failed to choose remote rack (location = ~/default), fallback to local rack org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy$NotEnoughReplicasException: at org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseRandom(BlockPlacementPolicyDefault.java:746) . . . 1:21:28.890 AM DEBUG NetworkTopology Failed to find datanode (scope="" excludedScope="/default"). 1:21:28.890 AM DEBUG BlockPlacementPolicy Failed to choose remote rack (location = ~/default), fallback to local rack org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy$NotEnoughReplicasException: at org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseRandom(BlockPlacementPolicyDefault.java:746) . . . 1:21:28.890 AM DEBUG NetworkTopology Choosing random from 1 available nodes on node /default, scope=/default, excludedScope=null, excludeNodes=[X.X.X.45:1004, X.X.X.44:1004] 1:21:28.890 AM DEBUG NetworkTopology Choosing random from 1 available nodes on node /default, scope=/default, excludedScope=null, excludeNodes=[X.X.X.43:1004, X.X.X.44:1004] 1:21:28.890 AM DEBUG NetworkTopology Node X.X.X.44:1004 is excluded, continuing. 1:21:28.890 AM DEBUG NetworkTopology Node X.X.X.43:1004 is excluded, continuing. 1:21:28.891 AM DEBUG NetworkTopology Node X.X.X.44:1004 is excluded, continuing. 1:21:28.891 AM DEBUG NetworkTopology Node X.X.X.43:1004 is excluded, continuing. 1:21:28.891 AM DEBUG NetworkTopology Node X.X.X.45:1004 is excluded, continuing. 1:21:28.891 AM DEBUG NetworkTopology chooseRandom returning X.X.X.45:1004 1:21:28.891 AM DEBUG NetworkTopology Node X.X.X.45:1004 is excluded, continuing. 1:21:28.891 AM DEBUG NetworkTopology chooseRandom returning X.X.X.43:1004 1:21:28.891 AM INFO StateChange BLOCK* allocateBlock: /PolybaseTest/QID2601_20180707_12128_3.parq.gz. BP-1767765873-X.X.X.41-1525850808562 blk_1073840961_100142{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[[DISK]DS-d069f8ba-9a4e-4b64-863d-9b818b27d298:NORMAL:X.X.X.43:1004|RBW], ReplicaUnderConstruction[[DISK]DS-e30cd499-5230-4e68-a6f7-4517e8f5b367:NORMAL:X.X.X.44:1004|RBW], ReplicaUnderConstruction[[DISK]DS-246055b9-1252-4d70-8b4a-6406346da99f:NORMAL:X.X.X.45:1004|RBW]]} 1:21:28.891 AM INFO StateChange BLOCK* allocateBlock: /PolybaseTest/QID2601_20180707_12128_4.parq.gz. BP-1767765873-X.X.X.41-1525850808562 blk_1073840962_100143{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[[DISK]DS-246055b9-1252-4d70-8b4a-6406346da99f:NORMAL:X.X.X.45:1004|RBW], ReplicaUnderConstruction[[DISK]DS-e30cd499-5230-4e68-a6f7-4517e8f5b367:NORMAL:X.X.X.44:1004|RBW], ReplicaUnderConstruction[[DISK]DS-d069f8ba-9a4e-4b64-863d-9b818b27d298:NORMAL:X.X.X.43:1004|RBW]]} 1:21:28.891 AM DEBUG NetworkTopology Choosing random from 3 available nodes on node /default, scope=/default, excludedScope=null, excludeNodes=[] 1:21:28.892 AM DEBUG NetworkTopology chooseRandom returning X.X.X.44:1004 1:21:28.892 AM DEBUG NetworkTopology Failed to find datanode (scope="" excludedScope="/default"). 1:21:28.892 AM DEBUG BlockPlacementPolicy Failed to choose remote rack (location = ~/default), fallback to local rack org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy$NotEnoughReplicasException: at org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseRandom(BlockPlacementPolicyDefault.java:746) . . .
I could see from logs two important points for me:
- Somehow all three DataNodes on our cluster are mentioned with the key word "excludeNodes". This looks strange for me.
- Regarding CDH Port list, port 1004 is used for secure communication. I have to say that port 1004 is not reachable at the moment from our SQL-Server (Polybase).
Many thanks in advance.
Baris
Created ‎07-09-2018 03:31 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
What we will do so far is now that we will open the Ports 1004 and 1006 for secure communication.
Can someone tell us please if the logs from our NameNode looks correct or strange?
We would need more assistance regarding logs.
Many thanks.
Baris
Created ‎07-18-2018 12:26 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
After opening ports 1004 and 1006 we are now able to write data into our Cluster.
Many thanks to weichiu, without his hint regarding enabling debug log for class org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy and org.apache.hadoop.net.NetworkTopology I would not be able to see the problem.
Created ‎07-30-2018 09:32 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
While reading and writing from HDFS getting bellow errors from Java prg side
Exception in thread "main" org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /user/hdfs/test11/tutorials11.txt could only be replicated to 0 nodes instead of minReplication (=1). There are 3 datanode(s) running and 3 node(s) are excluded in this operation.
Created ‎07-30-2018 07:48 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
