Support Questions
Find answers, ask questions, and share your expertise

Copying files to Hadoop Error

Copying files to Hadoop Error

New Contributor

Hi,

I am trying to load a file into hortonworks sandbox platform (HDP-2.5.0.0) using Pentaho ETL. "Hadoop copy files step". When I try to run the ETL job I get the following error:

2017/06/27 16:41:57 - Hadoop Copy Files - ERROR (version 7.0.0.0-25, build 1 from 2016-11-05 15.35.36 by buildguy) : File System Exception: Could not copy "file:///C:/Users/Julius Gamboa/Downloads/ebooks/Pentaho/Pentaho for Big Data Analytics/2159OS_03_Code/samples/data/product-price-history.tsv.gz" to "hdfs://sandbox:8020/user/maria_dev/product-price-history.tsv.gz". 2017/06/27 16:41:57 - Hadoop Copy Files - ERROR (version 7.0.0.0-25, build 1 from 2016-11-05 15.35.36 by buildguy) : Caused by: Could not close the output stream for file "hdfs://sandbox:8020/user/maria_dev/product-price-history.tsv.gz". 2017/06/27 16:41:57 - Hadoop Copy Files - ERROR (version 7.0.0.0-25, build 1 from 2016-11-05 15.35.36 by buildguy) : Caused by: File /user/maria_dev/product-price-history.tsv.gz could only be replicated to 0 nodes instead of minReplication (=1). There are 1 datanode(s) running and 1 node(s) are excluded in this operation. at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:1641) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getNewBlockTargets(FSNamesystem.java:3198) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:3122) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:843) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:500) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:640) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2313) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2309) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1724) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2307)

I have tried to search the error and there are a lot of same instances but the instructions are very hard to follow as I am just a beginner in administering a Hadoop environment. Hopefully someone can help/guide me on the solution to this error. Thank you in advance.

Regards,

Julius

2 REPLIES 2

Re: Copying files to Hadoop Error

Super Mentor

@Julius Gamboa

Are you sure that your NameNode hostname is "sandbox" in your output we see :

hdfs://sandbox:8020/user/maria_dev/product-price-history.tsv.gz

.

Should it be following in case of Hortonworks. Please check the "fs.defaultFS" setting in your HDFS conf. (advanced core-site.xml

hdfs://sandbox.hortonworks.com:8020

.

Also please make sure that while running the HDFS commands you have the correct hdfs conf directory set.

.

Are you able to reach to that host and port from the machine where you are running the copy files command?

Please check your /etc/hosts file if you have mapped the sandbox host name correctly.

sandbox

OR

sandbox.hortonworks.com

.

Re: Copying files to Hadoop Error

New Contributor

Hi @Jay SenSharma. Thank you for your response. I have tried to change the hostname but still am getting the error

2017/06/28 00:30:02 - Hadoop Copy Files - ERROR (version 7.0.0.0-25, build 1 from 2016-11-05 15.35.36 by buildguy) : File System Exception: Could not copy "file:///C:/Users/Julius Gamboa/Downloads/ebooks/Pentaho/Pentaho for Big Data Analytics/2159OS_03_Code/samples/data/product-price-history.tsv.gz" to "hdfs://sandbox.hortonworks.com:8020/user/maria_dev/product-price-history.tsv.gz". 2017/06/28 00:30:02 - Hadoop Copy Files - ERROR (version 7.0.0.0-25, build 1 from 2016-11-05 15.35.36 by buildguy) : Caused by: Could not close the output stream for file "hdfs://sandbox.hortonworks.com:8020/user/maria_dev/product-price-history.tsv.gz". 2017/06/28 00:30:02 - Hadoop Copy Files - ERROR (version 7.0.0.0-25, build 1 from 2016-11-05 15.35.36 by buildguy) : Caused by: File /user/maria_dev/product-price-history.tsv.gz could only be replicated to 0 nodes instead of minReplication (=1). There are 1 datanode(s) running and 1 node(s) are excluded in this operation. at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:1641) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getNewBlockTargets(FSNamesystem.java:3198) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:3122) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:843) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:500) at ...

I am using a windows machine with a virtualbox instance of the hdp sandbox. Do you mean my windows /etc/host file? I have created the following entry below.

# Copyright (c) 1993-2009 Microsoft Corp. ...

192.168.43.122 sandbox.hortonworks.com