Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

How to find yarn application ID for Hadoop fs -copyFromLocal file1.dat /home/hadoop/file1.dat

avatar
Expert Contributor

I have a job which copy data from Local file system and HDFS

1) Hadoop fs -copyFromLocal file1.dat /home/hadoop/file1.dat

2) How to find yarn application ID for this copyformlocal command

thanks,

1 ACCEPTED SOLUTION

avatar
Expert Contributor

Updating late , After further checking information as below.

1) Hadoop fs -copyFromLocal file1.dat /home/hadoop/file1.dat

:- its linux server local command You can check its local server process by #ps -ef|grep file1.dat |grep -i copyFromLocal, you will find the process id ,Hence again we can its local process.

2) How to find yarn application ID for this copyformlocal command

:- Its linux server local command and use the local server resource, hence you wont able to find MR/Yarn Jobs. While data copy RM assign the resources however its for datacopy only.

Hence "hadoop fs " command occupy the resource from local linux server and hadoop cluster as well for copy only. Where proces is local only , it wont create MR/Yarn Jobs.

View solution in original post

3 REPLIES 3

avatar

Hi @zkfs

There isn't one. For the above example, you will notice an entry for a non mapreduce job in the namenode log similar to this example;

hadoop-hdfs-namenode.log:2018-08-24 06:44:41,819 INFO  hdfs.StateChange (FSNamesystem.java:completeFile(3759)) - DIR* completeFile: /user/hadoop/file1.dat._COPYING_ is closed by DFSClient_NONMAPREDUCE_956954044_1

What happens is; the client used the create() operation defined in the DistributedFileSystem class, and then makes use of the DFSOutputStream class to write to the an internal queue, called the 'data queue' which is used by the datastreamer, which in turn will allocate blocks for the data that we want to write with the copyfromlocal command. There is no mapreduce/yarn job here, which you can notice from the NONMAPREDUCE entry in the namenode log. For some other tools, such as distcp, you would see mapreduce involved.

avatar
Expert Contributor

Entries will updated in logs, however is there any command to check application id for Hadoop Command i am looking like that.

Example :- for Yarn we can check list of running jobs by using YARN command #yarn application -list

avatar
Expert Contributor

Updating late , After further checking information as below.

1) Hadoop fs -copyFromLocal file1.dat /home/hadoop/file1.dat

:- its linux server local command You can check its local server process by #ps -ef|grep file1.dat |grep -i copyFromLocal, you will find the process id ,Hence again we can its local process.

2) How to find yarn application ID for this copyformlocal command

:- Its linux server local command and use the local server resource, hence you wont able to find MR/Yarn Jobs. While data copy RM assign the resources however its for datacopy only.

Hence "hadoop fs " command occupy the resource from local linux server and hadoop cluster as well for copy only. Where proces is local only , it wont create MR/Yarn Jobs.