Support Questions

Find answers, ask questions, and share your expertise

HDPCD practice exam task 1, how to import data into HDFS?

avatar

I am having problem to create AWS EC2 because of the limitation and amazon is still processing my request.

So i use hortonworks Sandbox + Virtual box... i tried to upload the "fligthdelays" data into local, and then use " hdfs dfs -copyFromLocal"

But i actually i am confused about "local"....does this local mean... after connecting to maria_dev@localhost? but how can i upload files to home/maria_dev/datasets? even i tried to crate datasets by using ambri, but i can't find files from /home/maria_dev...... i am very confused how to write Task 1 by using sandbox....

1 ACCEPTED SOLUTION

avatar
Super Guru

@seninus The Copy To and Copy From HDFS syntax is as follows:

hdfs dfs -copyFromLocal /local/folder/file.txt /hdfs/folder/

hdfs dfs -copyToLocal /hdfs/folder/file2.txt /local/folder/

where /local/ is the local file system and /hdfs/ is the hdfs file system

It is also important to note, you want to execute those commands as hdfs user so:

sudo su - hdfs

or

sudo su - hdfs -c "hdfs dfs -copyFromLocal /local/folder/file.txt /hdfs/folder/"

If this answer is helpful please choose ACCEPT.

View solution in original post

8 REPLIES 8

avatar
Super Guru

@seninus The Copy To and Copy From HDFS syntax is as follows:

hdfs dfs -copyFromLocal /local/folder/file.txt /hdfs/folder/

hdfs dfs -copyToLocal /hdfs/folder/file2.txt /local/folder/

where /local/ is the local file system and /hdfs/ is the hdfs file system

It is also important to note, you want to execute those commands as hdfs user so:

sudo su - hdfs

or

sudo su - hdfs -c "hdfs dfs -copyFromLocal /local/folder/file.txt /hdfs/folder/"

If this answer is helpful please choose ACCEPT.

avatar

You mean I have to install local hadoop(hdfs) at my Mac terminal first? and then use sudo su - hdfs at my Mac terminal? I think I installed hadoop( havn't configured, it seems to be a long process), and when I tried "sh - hdfs ", the Mac terminal gave back " su: unknown login: hdfs" to me...

avatar
Super Guru

@seninus those hdfs commands are commands to execute using terminal prompt on the sandbox node, not your mac... copy LOCAL is from sandbox node to hdfs, NOT your local mac

avatar

Yes, this is what I understood and was trying to do. but I uploaded flightdelays*.csv files to /datasets/flightdelays/ through Ambari. but I can't see local files when I login with maria_dev in terminal...

86478-screen-shot-2018-08-14-at-134533.png

86479-screen-shot-2018-08-14-at-134743.png

avatar
Super Guru

now you just need to do the copy from hdfs to local filesystem:

sudo su - hdfs -c "hdfs dfs -copyToLocal /datasets /tmp"

mv /tmp/datasets /home/maria_dev

This last command because hdfs user cant write to /home/maria_dev. So write hdfs to tmp, move from /tmp to /home/maria_dev

avatar
Super Guru

additionally that sandbox probably has a local (to the mac) folder path mounted on the sandbox file system. You would need to use that path to get files from your mac to the sandbox, then sandbox to hdfs

avatar

Thanks for the tips. yeah I found the similar questions and solved it. Thanks

https://community.hortonworks.com/questions/9371/am-i-able-to-copy-a-file-from-my-mac-usersrevgeolo....

avatar
Super Guru

@seninus glad you got it working. Please click accept on the main answer please, it helps close the question and gives me some reputation points. ;O)