Support Questions
Find answers, ask questions, and share your expertise

Unable to copy from local to HDSF

Explorer

Hi

I am a brand new user. I have installed Hortonworks Sandbox on my VM virtual box manager.

I have logged into Ambari 127.0.0.1 using maria_dev user name.

I have installed Putty and set up by using a connection with maria_dev@sandbox-hdp.

I cannot copy a file from my local directory to HDFS. Do I need to set permissions or have I missed a step in the set up process?

Any assistance would be greatly appreciated.

Thanks

16 REPLIES 16

Super Mentor

@Matthew May

Can you please share the exact command and the error that you are getting while copying a file from local to HDFS?

Explorer

@ Jay Kumar SenSharma

I have a file on my desktop called sample.txt (in location Users/Matt/dir3/sample.txt.

I have tried this:

hadoop fs -copyFromLocal /Users/Matt/dir3/sample.txt /user/maria_dev/ and receive the error:

copyFromLocal: `/Users/Matt/dir3/sample.txt': No such file or directory

Super Mentor

Based on the error it looks like you are trying to Push the files from your MAC desktop files to HDP Sandbox HDFS cluster. Please correct me if i am wrong.

copyFromLocal: `/Users/Matt/dir3/sample.txt': No such file or directory

.

Please check few things:

1. Is this file "/Users/Matt/dir3/sample.txt" existing and has correct permission ? so that the user who is running "hadoop" command has proper read access to this file? Please share the output of the following file.

# ls -l /Users/Matt/dir3/sample.txt

2. If you just want to Put the local files to HDFS then another simple approach will be to use Ambari File View.

.

.

Additionally if you want to Put a file inside "/user/maria_dev/" HDFS directory then the user who is running this hadoop command must belong to "hadoop" group (OR) should have username "maria_dev" because the HDFS directory has the following permission.

[root@sandbox /]# su - hdfs -c "hadoop fs -ls /user" | grep maria_dev
drwxr-xr-x   - maria_dev hadoop          0 2017-10-21 11:01 /user/maria_dev

.

Super Mentor

Additionally, In addition to the permissions issue. If you want to run the "hadoop" client commands as you posted from your local machine then you will need to make sue that your local machine is setup as a Hadoop Client machine (means it should have all the hadoop libraries + core-site.xml and hdfs-site.xml) files.

.

However as you are using Hortonworks Sandbox hence it will be really good and easy to Put to files to HDFS from your laptop using Ambari File View. Please see: https://hortonworks.com/tutorial/hadoop-tutorial-getting-started-with-hdp/section/2/

Explorer

@ Jay Kumar SenSharma

you are correct, I am trying to push the file from my (Windows) desktop to Sandbox. The file does exist, however I am unsure if the correct permissions have been set? The output is:

I understand that I can use Ambari File View, thank you, however my desire is to upload multiple files at once from a directory (the sample file is a test).

ls: cannot access /Users/Matt/dir3/sample.txt: No such file or directory

Explorer

@ Jay Kumar SenSharma


Could you please explain how the user who is running hadoop command can belong to "hadoop" group?

Super Mentor

If you want to run the "hadoop" client commands as you posted from your Windows machine then you will need to make sue that your local machine is setup as a Hadoop Client machine (means it should have all the hadoop libraries + core-site.xml and hdfs-site.xml) files.

.

However as you are using Hortonworks Sandbox hence it will be really good and easy to Put to files to HDFS from your laptop using Ambari File View. Please see: https://hortonworks.com/tutorial/hadoop-tutorial-getting-started-with-hdp/section/2/

.

Explorer

@ Jay Kumar SenSharma

thank you, however I wish to upload multiple files at once from a directory. In Ambari File View, I can only upload a single file if I understand correctly.

Super Mentor

Regarding your query: "Could you please explain how the user who is running hadoop command can belong to "hadoop" group?"

.

For Windows environment i can not help much. However in Linux based environment you can simple use the to add "testuser" to hadoop group in linux based environment.

# sudo adduser --ingroup hadoop testuser

.

If you really want to test the "copyFromLocal" command then you should do it inside the Sandbox instance as it has "hadoop" group already present in it and all the requierd libraries.

Example:

Login to Snadbox using SSH session on port 2222 (this port must be used instead of default SSH port). Or while using Putty please define the SSH port as 2222

# ssh root@127.0.0.1  -p 2222

.

Once you are inside the Sandbox terminal then you try running the command:

# su - maria_dev
# hadoop fs -copyFromLocal /etc/passwd /user/maria_dev/

.

Instead of file "/etc/passwd" you can push your own files to HDFS. You will need to just SCP your files to Sandbox and then from Sandbox you can put them to HDFS.

Or use Ambari File view to directly post your files to HDFS.

Explorer

@ Jay Kumar SenSharma

thank you for your reply.

From Putty I used your command above:

ssh root@127.0.0.1  -p 2222

And received the following error:

ssh: connect to host 127.0.0.1 port 2222: Connection refused

Can you also please explain what you mean by "you will need to SCP your files to Sandbox and then from Sandbox you can put them to HDFS"?

Super Mentor

@Matt

I am not sure from putty how you entered that command "ssh root@127.0.0.1 -p 2222"

But the best way to connect to Sandbox is described in the following Tutorial which might be really helpful. https://hortonworks.com/tutorial/learning-the-ropes-of-the-hortonworks-sandbox/#terminal-access

.

Usually if you want to put a file from Windows to Linux host then WinScp kind of UI utilities are the best if you want to do SCP.

Super Mentor

@Matt

Using WInSCP once we are able to copy the file to Sandbox then we should be able to run the following commands to push those files to HDFS from Sandbox.

Suppose in Sandbox host the file is present at "/toot/sample.txt" then it can be pushed to HDFS as following:

.

You will need to do a SSH to the Sandbox first and then run the following commands:

# su - maria_dev
# hadoop fs -copyFromLocal  /root/sample.txt  /user/maria_dev/
(OR)
# hdfs dfs -put  /root/sample.txt  /user/maria_dev/
# hdfs dfs -ls  /user/maria_dev/


NOTE: the user "maria_dev" should have read permission atleast on the file "/root/sample.txt" before it tried to read/push it to HDFS.

.


Super Mentor

@Matt

If this issue is resolved then it will be also great if you can mark this HCC thread as Answered by clicking on the "Accept" Button on the correct answer. That way other HCC users can quickly find the solution when they encounter the same issue.

Explorer

@ Jay Kiumar SenSharma

ok thank you for your help anyway. I appreciate you replying to me even though I have had no success with it.

@Matt

Heya. @Jay Kumar SenSharma's is right on point, though it sounds like there's a bit of confusion as to what Hadoop's "copyFromLocal" command does and what it expects.

You run the Hadoop command with the option "copyFromLocal" from within the sandbox. This command will look on the Sandbox's local filesystem and copy the files you specified into the Hadoop filesystem.

I can definitely see how this may be a bit confusing - it sort of sounds like the command would pull in files from your machine into HDFS on the sandbox. Remember, though, that the sandbox exists on a virtual machine that by default is separate from your own and can't access your filesystem.

So you can definitely use that command, but you'll first have to move your local files into the sandbox. There are a bunch of ways to do that in bulk, check out Jay's post regarding SCP.

Hope that clarifies a bit, and good luck! 🙂

Explorer

@Matt,

Hope you were successful in copying your files to HDFS.

For any other "Getting Started Questions" you might have please check out the Learning the Ropes Hortonworks Tutorial.

; ;