Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Coud not copy csv file to hdfs with Pentaho

Highlighted

Coud not copy csv file to hdfs with Pentaho

New Contributor

Hi!

I'm trying to upload csv file to HDP 3.0 (hdfs 3.1.1) with Pentaho Spoon 8.3 (Hadoop copy files step)

Got the same error as here: https://community.hortonworks.com/questions/108440/copying-files-to-hadoop-error.html
"Could not close the output stream for file"

110300-1565593143889.png

Empty file in hdfs appeared as a result. Host OS - windows 8, in Pentaho.bat set user hdfs:

set HADOOP_USER_NAME=hdfs

Enabled debug log for hdfs in advanced log4j settings in ambari (mostly by guess and github source codes):

log4j.logger.org.apache.hadoop.ipc=DEBUGlog4j.logger.org.apache.hadoop.net=DEBUGlog4j.logger.org.apache.hadoop.net.NetworkTopology=DEBUGlog4j.logger.org.apache.hadoop.fs=DEBUGlog4j.logger.org.apache.hadoop.fs.FSNamesystem=DEBUGlog4j.logger.org.apache.hadoop.fs.FSDataOutputStream=DEBUGlog4j.logger.org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy=DEBUGlog4j.logger.org.apache.hadoop.hdfs.StateChange=DEBUGlog4j.logger.org.apache.hadoop.hdfs.DFSClient=DEBUGlog4j.logger.org.apache.hadoop.hdfs.protocol=DEBUG 

Hadoop Root Logger = DEBUG,RFA

Now I have such log:

2019-08-12 06:30:00,840 DEBUG org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer (NameNodeRpcServer.java:create(757)) - *DIR* NameNode.create: file /user/hdfs/riskfactor.csv for DFSClient_NONMAPREDUCE_1981358968_179 at 172.18.0.3
2019-08-12 06:30:00,841 DEBUG org.apache.hadoop.hdfs.server.namenode.FSNamesystem (FSNamesystem.java:startFileInt(2378)) - DIR* NameSystem.startFile: src=/user/hdfs/riskfactor.csv, holder=DFSClient_NONMAPREDUCE_1981358968_179, clientMachine=172.18.0.3, createParent=true, replication=1, createFlag=[CREATE, OVERWRITE], blockSize=134217728, supportedVersions=[CryptoProtocolVersion{description='Encryption zones', version=2, unknownValue=null}]
2019-08-12 06:30:00,842 DEBUG org.apache.hadoop.hdfs.server.namenode.FSDirDeleteOp (FSDirDeleteOp.java:delete(55)) - DIR* FSDirectory.delete: /user/hdfs/riskfactor.csv
2019-08-12 06:30:00,845 DEBUG org.apache.hadoop.hdfs.server.namenode.FSDirDeleteOp (FSDirDeleteOp.java:unprotectedDelete(269)) - DIR* FSDirectory.unprotectedDelete: /user/hdfs/riskfactor.csv is removed
2019-08-12 06:30:00,848 DEBUG org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp (FSDirWriteFileOp.java:addFile(571)) - DIR* addFile: riskfactor.csv is added
2019-08-12 06:30:00,849 DEBUG org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp (FSDirWriteFileOp.java:startFile(407)) - DIR* NameSystem.startFile: added /user/hdfs/riskfactor.csv inode 681603 DFSClient_NONMAPREDUCE_1981358968_179
2019-08-12 06:30:00,851 INFO  org.apache.hadoop.hdfs.server.namenode.FSEditLog (FSEditLog.java:printStatistics(774)) - Number of transactions: 2 Total time for transactions(ms): 4 Number of transactions batched in Syncs: 1967516 Number of syncs: 2 SyncTimes(ms): 11
2019-08-12 06:30:00,852 DEBUG org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker (ProtobufRpcEngine.java:call(550)) - Served: create, queueTime= 0 procesingTime= 14
2019-08-12 06:30:00,853 DEBUG org.apache.hadoop.ipc.Server$Responder (Server.java:processResponse(1468)) - IPC Server handler 12 on 8020: responding to Call#178 Retry#0 org.apache.hadoop.hdfs.protocol.ClientProtocol.create from 172.18.0.3:41538
2019-08-12 06:30:00,853 DEBUG org.apache.hadoop.ipc.Server$Responder (Server.java:processResponse(1487)) - IPC Server handler 12 on 8020: responding to Call#178 Retry#0 org.apache.hadoop.hdfs.protocol.ClientProtocol.create from 172.18.0.3:41538 Wrote 91 bytes.
2019-08-12 06:30:01,016 DEBUG org.apache.hadoop.ipc.Server$Connection (Server.java:processOneRpc(2355)) -  got #179
2019-08-12 06:30:01,017 DEBUG org.apache.hadoop.ipc.Server$Handler (Server.java:run(2670)) - IPC Server handler 20 on 8020: Call#179 Retry#0 org.apache.hadoop.hdfs.protocol.ClientProtocol.addBlock from 172.18.0.3:41538 for RpcKind RPC_PROTOCOL_BUFFER
2019-08-12 06:30:01,018 DEBUG org.apache.hadoop.hdfs.server.namenode.FSNamesystem (FSNamesystem.java:getAdditionalBlock(2684)) - BLOCK* getAdditionalBlock: /user/hdfs/riskfactor.csv  inodeId 681603 for DFSClient_NONMAPREDUCE_1981358968_179
2019-08-12 06:30:01,041 DEBUG org.apache.hadoop.net.NetworkTopology (NetworkTopology.java:chooseRandom(538)) - Choosing random from 1 available nodes on node /default-rack, scope=/default-rack, excludedScope=null, excludeNodes=[]. numOfDatanodes=1.
2019-08-12 06:30:01,041 DEBUG org.apache.hadoop.net.NetworkTopology (NetworkTopology.java:chooseRandom(547)) - chooseRandom returning 172.18.0.2:50010
2019-08-12 06:30:01,043 DEBUG org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp (FSDirWriteFileOp.java:addBlock(516)) - DIR* FSDirectory.addBlock: /user/hdfs/riskfactor.csv with blk_1074001529_260815 block is added to the in-memory file system
2019-08-12 06:30:01,043 INFO  org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp (FSDirWriteFileOp.java:logAllocatedBlock(790)) - BLOCK* allocate blk_1074001529_260815, replicas=172.18.0.2:50010 for /user/hdfs/riskfactor.csv
2019-08-12 06:30:01,043 DEBUG org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp (FSDirWriteFileOp.java:persistNewBlock(750)) - persistNewBlock: /user/hdfs/riskfactor.csv with new block blk_1074001529_260815, current total block count is 1
2019-08-12 06:30:01,059 DEBUG org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker (ProtobufRpcEngine.java:call(550)) - Served: addBlock, queueTime= 1 procesingTime= 42
2019-08-12 06:30:01,059 DEBUG org.apache.hadoop.ipc.Server$Responder (Server.java:processResponse(1468)) - IPC Server handler 20 on 8020: responding to Call#179 Retry#0 org.apache.hadoop.hdfs.protocol.ClientProtocol.addBlock from 172.18.0.3:41538
2019-08-12 06:30:01,059 DEBUG org.apache.hadoop.ipc.Server$Responder (Server.java:processResponse(1487)) - IPC Server handler 20 on 8020: responding to Call#179 Retry#0 org.apache.hadoop.hdfs.protocol.ClientProtocol.addBlock from 172.18.0.3:41538 Wrote 490 bytes.
2019-08-12 06:30:01,188 DEBUG org.apache.hadoop.ipc.Server$Connection (Server.java:processOneRpc(2355)) -  got #180
2019-08-12 06:30:01,189 DEBUG org.apache.hadoop.ipc.Server$Handler (Server.java:run(2670)) - IPC Server handler 7 on 8020: Call#180 Retry#0 org.apache.hadoop.hdfs.protocol.ClientProtocol.getServerDefaults from 172.18.0.3:41538 for RpcKind RPC_PROTOCOL_BUFFER
2019-08-12 06:30:01,195 DEBUG org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker (ProtobufRpcEngine.java:call(550)) - Served: getServerDefaults, queueTime= 1 procesingTime= 6
2019-08-12 06:30:01,195 DEBUG org.apache.hadoop.ipc.Server$Responder (Server.java:processResponse(1468)) - IPC Server handler 7 on 8020: responding to Call#180 Retry#0 org.apache.hadoop.hdfs.protocol.ClientProtocol.getServerDefaults from 172.18.0.3:41538
2019-08-12 06:30:01,195 DEBUG org.apache.hadoop.ipc.Server$Responder (Server.java:processResponse(1487)) - IPC Server handler 7 on 8020: responding to Call#180 Retry#0 org.apache.hadoop.hdfs.protocol.ClientProtocol.getServerDefaults from 172.18.0.3:41538 Wrote 64 bytes.
2019-08-12 06:30:01,243 DEBUG org.apache.hadoop.ipc.Server$Connection (Server.java:processOneRpc(2355)) -  got #181
2019-08-12 06:30:01,243 DEBUG org.apache.hadoop.ipc.Server$Handler (Server.java:run(2670)) - IPC Server handler 11 on 8020: Call#181 Retry#0 org.apache.hadoop.hdfs.protocol.ClientProtocol.abandonBlock from 172.18.0.3:41538 for RpcKind RPC_PROTOCOL_BUFFER
2019-08-12 06:30:01,244 DEBUG org.apache.hadoop.hdfs.server.namenode.FSNamesystem (FSNamesystem.java:abandonBlock(2785)) - BLOCK* NameSystem.abandonBlock: BP-1419118625-172.17.0.2-1543512323726:blk_1074001529_260815 of file /user/hdfs/riskfactor.csv
2019-08-12 06:30:01,244 DEBUG org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp (FSDirWriteFileOp.java:unprotectedRemoveBlock(91)) - DIR* FSDirectory.removeBlock: /user/hdfs/riskfactor.csv with blk_1074001529_260815 block is removed from the file system
2019-08-12 06:30:01,244 DEBUG org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp (FSDirWriteFileOp.java:persistBlocks(111)) - persistBlocks: /user/hdfs/riskfactor.csv with 0 blocks is persisted to the file system
2019-08-12 06:30:01,244 DEBUG org.apache.hadoop.hdfs.server.namenode.FSNamesystem (FSNamesystem.java:abandonBlock(2794)) - BLOCK* NameSystem.abandonBlock: BP-1419118625-172.17.0.2-1543512323726:blk_1074001529_260815 is removed from pendingCreates
2019-08-12 06:30:01,244 DEBUG org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker (ProtobufRpcEngine.java:call(550)) - Served: abandonBlock, queueTime= 0 procesingTime= 1
2019-08-12 06:30:01,245 DEBUG org.apache.hadoop.ipc.Server$Responder (Server.java:processResponse(1468)) - FSEditLogAsync: responding to Call#181 Retry#0 org.apache.hadoop.hdfs.protocol.ClientProtocol.abandonBlock from 172.18.0.3:41538
2019-08-12 06:30:01,245 DEBUG org.apache.hadoop.ipc.Server$Responder (Server.java:processResponse(1487)) - FSEditLogAsync: responding to Call#181 Retry#0 org.apache.hadoop.hdfs.protocol.ClientProtocol.abandonBlock from 172.18.0.3:41538 Wrote 33 bytes.
2019-08-12 06:30:01,247 DEBUG org.apache.hadoop.ipc.Server$Connection (Server.java:processOneRpc(2355)) -  got #182
2019-08-12 06:30:01,247 DEBUG org.apache.hadoop.ipc.Server$Handler (Server.java:run(2670)) - IPC Server handler 5 on 8020: Call#182 Retry#0 org.apache.hadoop.hdfs.protocol.ClientProtocol.addBlock from 172.18.0.3:41538 for RpcKind RPC_PROTOCOL_BUFFER
2019-08-12 06:30:01,249 DEBUG org.apache.hadoop.hdfs.server.namenode.FSNamesystem (FSNamesystem.java:getAdditionalBlock(2684)) - BLOCK* getAdditionalBlock: /user/hdfs/riskfactor.csv  inodeId 681603 for DFSClient_NONMAPREDUCE_1981358968_179
2019-08-12 06:30:01,250 DEBUG org.apache.hadoop.net.NetworkTopology (NetworkTopology.java:chooseRandom(538)) - Choosing random from 0 available nodes on node /default-rack, scope=/default-rack, excludedScope=null, excludeNodes=[172.18.0.2:50010]. numOfDatanodes=1.
2019-08-12 06:30:01,250 DEBUG org.apache.hadoop.net.NetworkTopology (NetworkTopology.java:chooseRandom(547)) - chooseRandom returning null
2019-08-12 06:30:01,250 DEBUG org.apache.hadoop.hdfs.net.DFSNetworkTopology (DFSNetworkTopology.java:chooseRandomWithStorageTypeTwoTrial(125)) - No node to choose.
2019-08-12 06:30:01,250 DEBUG org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault (BlockPlacementPolicyDefault.java:chooseLocalRack(647)) - Failed to choose from local rack (location = /default-rack); the second replica is not found, retry choosing randomly
org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy$NotEnoughReplicasException:
        at org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseRandom(BlockPlacementPolicyDefault.java:834)
        at org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseRandom(BlockPlacementPolicyDefault.java:722)
        at org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseLocalRack(BlockPlacementPolicyDefault.java:629)
        at org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseLocalStorage(BlockPlacementPolicyDefault.java:589)
        at org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseTargetInOrder(BlockPlacementPolicyDefault.java:488)
        at org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseTarget(BlockPlacementPolicyDefault.java:419)
        at org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseTarget(BlockPlacementPolicyDefault.java:295)
        at org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseTarget(BlockPlacementPolicyDefault.java:148)
        at org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseTarget(BlockPlacementPolicyDefault.java:164)
        at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:2113)
        at org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.chooseTargetForNewBlock(FSDirWriteFileOp.java:286)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2706)
        at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:875)
        at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:561)
        at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
        at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025)
        at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:876)
        at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:822)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2682)
2019-08-12 06:30:01,250 DEBUG org.apache.hadoop.net.NetworkTopology (NetworkTopology.java:chooseRandom(538)) - Choosing random from 0 available nodes on node /, scope=, excludedScope=null, excludeNodes=[172.18.0.2:50010]. numOfDatanodes=1.
2019-08-12 06:30:01,250 DEBUG org.apache.hadoop.net.NetworkTopology (NetworkTopology.java:chooseRandom(547)) - chooseRandom returning null
2019-08-12 06:30:01,250 DEBUG org.apache.hadoop.hdfs.net.DFSNetworkTopology (DFSNetworkTopology.java:chooseRandomWithStorageTypeTwoTrial(125)) - No node to choose.
2019-08-12 06:30:01,251 WARN  org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault (BlockPlacementPolicyDefault.java:chooseTarget(432)) - Failed to place enough replicas, still in need of 1 to reach 1 (unavailableStorages=[], storagePolicy=BlockStoragePolicy{HOT:7, storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, newBlock=true)
2019-08-12 06:30:01,251 DEBUG org.apache.hadoop.net.NetworkTopology (NetworkTopology.java:chooseRandom(538)) - Choosing random from 0 available nodes on node /default-rack, scope=/default-rack, excludedScope=null, excludeNodes=[172.18.0.2:50010]. numOfDatanodes=1.
2019-08-12 06:30:01,251 DEBUG org.apache.hadoop.net.NetworkTopology (NetworkTopology.java:chooseRandom(547)) - chooseRandom returning null
2019-08-12 06:30:01,251 DEBUG org.apache.hadoop.hdfs.net.DFSNetworkTopology (DFSNetworkTopology.java:chooseRandomWithStorageTypeTwoTrial(125)) - No node to choose.
2019-08-12 06:30:01,251 DEBUG org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault (BlockPlacementPolicyDefault.java:chooseLocalRack(647)) - Failed to choose from local rack (location = /default-rack); the second replica is not found, retry choosing randomly

You can see that at some point abandonBlock procedure called: abandonBlock from 172.18.0.3:41538

Why can this happen?

I tried to upload csv through web-interface (both Ambari files view and browse directory at http://sandbox-hdp.hortonworks.com:50070/explorer.html#/) and it's succesfully stored. Logs in that case shows another ip - 172.18.0.2

2019-08-12 07:19:22,565 DEBUG org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer (NameNodeRpcServer.java:create(757)) - *DIR* NameNode.create: file /tmp/riskfactor.csv for DFSClient_NONMAPREDUCE_1960509895_42 at 172.18.0.2
2019-08-12 07:19:22,565 DEBUG org.apache.hadoop.hdfs.server.namenode.FSNamesystem (FSNamesystem.java:startFileInt(2378)) - DIR* NameSystem.startFile: src=/tmp/riskfactor.csv, holder=DFSClient_NONMAPREDUCE_1960509895_42, clientMachine=172.18.0.2, createParent=true, replication=1, createFlag=[CREATE], blockSize=134217728, supportedVersions=[CryptoProtocolVersion{description='Encryption zones', version=2, unknownValue=null}]
2019-08-12 07:19:22,566 DEBUG org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp (FSDirWriteFileOp.java:addFile(571)) - DIR* addFile: riskfactor.csv is added
2019-08-12 07:19:22,567 DEBUG org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp (FSDirWriteFileOp.java:startFile(407)) - DIR* NameSystem.startFile: added /tmp/riskfactor.csv inode 681608 DFSClient_NONMAPREDUCE_1960509895_42
2019-08-12 07:19:22,568 DEBUG org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker (ProtobufRpcEngine.java:call(550)) - Served: create, queueTime= 0 procesingTime= 3
2019-08-12 07:19:22,568 DEBUG org.apache.hadoop.ipc.Server$Responder (Server.java:processResponse(1468)) - IPC Server handler 87 on 8020: responding to Call#1051 Retry#0 org.apache.hadoop.hdfs.protocol.ClientProtocol.create from 172.18.0.2:52358
2019-08-12 07:19:22,568 DEBUG org.apache.hadoop.ipc.Server$Responder (Server.java:processResponse(1487)) - IPC Server handler 87 on 8020: responding to Call#1051 Retry#0 org.apache.hadoop.hdfs.protocol.ClientProtocol.create from 172.18.0.2:52358 Wrote 92 bytes.
2019-08-12 07:19:22,616 DEBUG org.apache.hadoop.ipc.Server$Connection (Server.java:processOneRpc(2355)) -  got #1052
2019-08-12 07:19:22,616 DEBUG org.apache.hadoop.ipc.Server$Handler (Server.java:run(2670)) - IPC Server handler 92 on 8020: Call#1052 Retry#0 org.apache.hadoop.hdfs.protocol.ClientProtocol.addBlock from 172.18.0.2:52358 for RpcKind RPC_PROTOCOL_BUFFER
2019-08-12 07:19:22,616 DEBUG org.apache.hadoop.hdfs.server.namenode.FSNamesystem (FSNamesystem.java:getAdditionalBlock(2684)) - BLOCK* getAdditionalBlock: /tmp/riskfactor.csv  inodeId 681608 for DFSClient_NONMAPREDUCE_1960509895_42
2019-08-12 07:19:22,617 DEBUG org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp (FSDirWriteFileOp.java:addBlock(516)) - DIR* FSDirectory.addBlock: /tmp/riskfactor.csv with blk_1074001532_260818 block is added to the in-memory file system
2019-08-12 07:19:22,617 INFO  org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp (FSDirWriteFileOp.java:logAllocatedBlock(790)) - BLOCK* allocate blk_1074001532_260818, replicas=172.18.0.2:50010 for /tmp/riskfactor.csv
2019-08-12 07:19:22,617 DEBUG org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp (FSDirWriteFileOp.java:persistNewBlock(750)) - persistNewBlock: /tmp/riskfactor.csv with new block blk_1074001532_260818, current total block count is 1
2019-08-12 07:19:22,618 DEBUG org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker (ProtobufRpcEngine.java:call(550)) - Served: addBlock, queueTime= 0 procesingTime= 2
2019-08-12 07:19:22,619 DEBUG org.apache.hadoop.ipc.Server$Responder (Server.java:processResponse(1468)) - FSEditLogAsync: responding to Call#1052 Retry#0 org.apache.hadoop.hdfs.protocol.ClientProtocol.addBlock from 172.18.0.2:52358
2019-08-12 07:19:22,619 DEBUG org.apache.hadoop.ipc.Server$Responder (Server.java:processResponse(1487)) - FSEditLogAsync: responding to Call#1052 Retry#0 org.apache.hadoop.hdfs.protocol.ClientProtocol.addBlock from 172.18.0.2:52358 Wrote 492 bytes.
2019-08-12 07:19:22,627 DEBUG org.apache.hadoop.ipc.Server$Connection (Server.java:processOneRpc(2355)) -  got #1053
2019-08-12 07:19:22,627 DEBUG org.apache.hadoop.ipc.Server$Handler (Server.java:run(2670)) - IPC Server handler 88 on 8020: Call#1053 Retry#0 org.apache.hadoop.hdfs.protocol.ClientProtocol.getServerDefaults from 172.18.0.2:52358 for RpcKind RPC_PROTOCOL_BUFFER
2019-08-12 07:19:22,627 DEBUG org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker (ProtobufRpcEngine.java:call(550)) - Served: getServerDefaults, queueTime= 0 procesingTime= 0
2019-08-12 07:19:22,628 DEBUG org.apache.hadoop.ipc.Server$Responder (Server.java:processResponse(1468)) - IPC Server handler 88 on 8020: responding to Call#1053 Retry#0 org.apache.hadoop.hdfs.protocol.ClientProtocol.getServerDefaults from 172.18.0.2:52358
2019-08-12 07:19:22,628 DEBUG org.apache.hadoop.ipc.Server$Responder (Server.java:processResponse(1487)) - IPC Server handler 88 on 8020: responding to Call#1053 Retry#0 org.apache.hadoop.hdfs.protocol.ClientProtocol.getServerDefaults from 172.18.0.2:52358 Wrote 64 bytes.
2019-08-12 07:19:22,739 DEBUG org.apache.hadoop.ipc.Server$Connection (Server.java:processOneRpc(2355)) -  got #1054
2019-08-12 07:19:22,739 DEBUG org.apache.hadoop.ipc.Server$Handler (Server.java:run(2670)) - IPC Server handler 57 on 8020: Call#1054 Retry#0 org.apache.hadoop.hdfs.protocol.ClientProtocol.complete from 172.18.0.2:52358 for RpcKind RPC_PROTOCOL_BUFFER
2019-08-12 07:19:22,739 DEBUG org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp (FSDirWriteFileOp.java:completeFile(666)) - DIR* NameSystem.completeFile: /tmp/riskfactor.csv for DFSClient_NONMAPREDUCE_1960509895_42
2019-08-12 07:19:22,740 INFO  org.apache.hadoop.hdfs.server.namenode.FSNamesystem (FSNamesystem.java:checkBlocksComplete(2912)) - BLOCK* blk_1074001532_260818 is COMMITTED but not COMPLETE(numNodes= 0 <  minimum = 1) in file /tmp/riskfactor.csv
2019-08-12 07:19:22,742 DEBUG org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker (ProtobufRpcEngine.java:call(550)) - Served: complete, queueTime= 0 procesingTime= 3
2019-08-12 07:19:22,742 DEBUG org.apache.hadoop.ipc.Server$Responder (Server.java:processResponse(1468)) - IPC Server handler 57 on 8020: responding to Call#1054 Retry#0 org.apache.hadoop.hdfs.protocol.ClientProtocol.complete from 172.18.0.2:52358
2019-08-12 07:19:22,742 DEBUG org.apache.hadoop.ipc.Server$Responder (Server.java:processResponse(1487)) - IPC Server handler 57 on 8020: responding to Call#1054 Retry#0 org.apache.hadoop.hdfs.protocol.ClientProtocol.complete from 172.18.0.2:52358 Wrote 35 bytes.
2019-08-12 07:19:22,749 DEBUG org.apache.hadoop.ipc.Server$Connection (Server.java:processOneRpc(2355)) -  got #1055
2019-08-12 07:19:22,749 DEBUG org.apache.hadoop.ipc.Server$Handler (Server.java:run(2670)) - IPC Server handler 35 on 8020: Call#1055 Retry#0 org.apache.hadoop.hdfs.server.protocol.DatanodeProtocol.blockReceivedAndDeleted from 172.18.0.2:40772 for RpcKind RPC_PROTOCOL_BUFFER
2019-08-12 07:19:22,758 DEBUG org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer (NameNodeRpcServer.java:blockReceivedAndDeleted(1574)) - *BLOCK* NameNode.blockReceivedAndDeleted: from DatanodeRegistration(172.18.0.2:50010, datanodeUuid=9ba7d4f6-56d3-4c19-806d-7919c2444bb0, infoPort=50075, infoSecurePort=0, ipcPort=8010, storageInfo=lv=-57;cid=CID-489f8882-b69a-4b89-ad7a-bbca42038186;nsid=1278768352;c=1543512323726) 1 blocks.
2019-08-12 07:19:22,759 DEBUG org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker (ProtobufRpcEngine.java:call(550)) - Served: blockReceivedAndDeleted, queueTime= 8 procesingTime= 2
2019-08-12 07:19:22,759 DEBUG org.apache.hadoop.ipc.Server$Responder (Server.java:processResponse(1468)) - IPC Server handler 35 on 8020: responding to Call#1055 Retry#0 org.apache.hadoop.hdfs.server.protocol.DatanodeProtocol.blockReceivedAndDeleted from 172.18.0.2:40772
2019-08-12 07:19:22,759 DEBUG org.apache.hadoop.hdfs.server.blockmanagement.BlockManager (BlockManager.java:addStoredBlock(3219)) - BLOCK* addStoredBlock: 172.18.0.2:50010 is added to blk_1074001532_260818 (size=2314)
2019-08-12 07:19:22,760 DEBUG org.apache.hadoop.hdfs.server.blockmanagement.BlockManager (BlockManager.java:processIncrementalBlockReport(4020)) - BLOCK* block RECEIVED_BLOCK: blk_1074001532_260818 is received from 172.18.0.2:50010
2019-08-12 07:19:22,760 DEBUG org.apache.hadoop.hdfs.server.blockmanagement.BlockManager (BlockManager.java:processIncrementalBlockReport(4023)) - *BLOCK* NameNode.processIncrementalBlockReport: from 172.18.0.2:50010 receiving: 0, received: 1, deleted: 0
2019-08-12 07:19:22,759 DEBUG org.apache.hadoop.ipc.Server$Responder (Server.java:processResponse(1487)) - IPC Server handler 35 on 8020: responding to Call#1055 Retry#0 org.apache.hadoop.hdfs.server.protocol.DatanodeProtocol.blockReceivedAndDeleted from 172.18.0.2:40772 Wrote 33 bytes.
2019-08-12 07:19:23,154 DEBUG org.apache.hadoop.ipc.Server$Connection (Server.java:processOneRpc(2355)) -  got #1056
2019-08-12 07:19:23,154 DEBUG org.apache.hadoop.ipc.Server$Handler (Server.java:run(2670)) - IPC Server handler 85 on 8020: Call#1056 Retry#0 org.apache.hadoop.hdfs.protocol.ClientProtocol.complete from 172.18.0.2:52358 for RpcKind RPC_PROTOCOL_BUFFER
2019-08-12 07:19:23,155 DEBUG org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp (FSDirWriteFileOp.java:completeFile(666)) - DIR* NameSystem.completeFile: /tmp/riskfactor.csv for DFSClient_NONMAPREDUCE_1960509895_42
2019-08-12 07:19:23,156 DEBUG org.apache.hadoop.hdfs.server.namenode.FSNamesystem (FSNamesystem.java:closeFile(3951)) - closeFile: /tmp/riskfactor.csv with 1 blocks is persisted to the file system
2019-08-12 07:19:23,156 INFO  org.apache.hadoop.hdfs.server.namenode.FSNamesystem (FSNamesystem.java:completeFile(2865)) - DIR* completeFile: /tmp/riskfactor.csv is closed by DFSClient_NONMAPREDUCE_1960509895_42
2019-08-12 07:19:23,156 DEBUG org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker (ProtobufRpcEngine.java:call(550)) - Served: complete, queueTime= 1 procesingTime= 1

May be this routing issue?

[root@sandbox-hdp hdfs]# ip route list
default via 172.18.0.1 dev eth0
172.18.0.0/16 dev eth0 proto kernel scope link

There was also "... src 172.18.0.2" that I deleted by recreating route, but nothing changed.

Some cases that I'm already tried with no luck:

- uploading another file

- uploading with another user (admin, root)

- uploading in another folder (/tmp, /user/hdfs, /user/admin)

- using another verwion of PDI

- forwarding port 50010 in vbox machine NAT settings

- set dfs.client.use.datanode.hostname = true at both sides (pentaho and server)

Have any idea what to do?

2 REPLIES 2

Re: Coud not copy csv file to hdfs with Pentaho

Contributor

If you want to just study, I recommend the config dfs.permissions.enabled=false .

 

And in my experience, Pentaho Spoon store some info to it's cache dir.

 

You can try like this...

- After Pentaho Spoon exit

- then delete cache dir

- and then restart your job.

 

Re: Coud not copy csv file to hdfs with Pentaho

New Contributor

Thanks, youngick

I set dfs.permissions.enabled=false in ambari and restarted hdfs, then closed Spoon, cleared PDI caches in ...\data-integration\system\karaf\caches\ and in ~\.pentaho\caches, started Spoon back and try job again with the same error. Nothing changed.

 

Also I'm tried such hard thing as namenode format as described here:

https://stackoverflow.com/questions/15571584/writing-to-hdfs-could-only-be-replicated-to-0-nodes-ins...

This method didn't help either.

 

As this is all a study case only, I'll come to it back later.