Created on 11-19-2014 01:47 AM - edited 09-16-2022 02:13 AM
Hello dear reader,
I am configuring an hadoop cluster CDH5 on 3 nodes.
1 NameNode + ResourceManager 1 DataNode + nodemanager + historyserver 1 DataNode + nodemanager
HDFS Works.
I configured YARN using advices from Cloudera and other websites
I try to use the example calcultating Pi with this command on my namenode :
asimon@namenode:/etc/hadoop/conf$ sudo hadoop jar /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples-2.5.0-cdh5.2.0.jar pi 2 2
which gives me :
Number of Maps = 2 Samples per Map = 2 Wrote input for Map #0 Wrote input for Map #1 Starting Job 14/11/18 17:03:04 INFO client.RMProxy: Connecting to ResourceManager at namenode.beijaflore.com/192.168.54.111:8032 14/11/18 17:03:04 INFO input.FileInputFormat: Total input paths to process : 2 14/11/18 17:03:04 INFO mapreduce.JobSubmitter: number of splits:2 14/11/18 17:03:05 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1416326563510_0001 14/11/18 17:03:05 INFO impl.YarnClientImpl: Submitted application application_1416326563510_0001 14/11/18 17:03:05 INFO mapreduce.Job: The url to track the job: http://namenode.beijaflore.com:8088/proxy/application_1416326563510_0001/ 14/11/18 17:03:05 INFO mapreduce.Job: Running job: job_1416326563510_0001
But the job is stuck here and can't show the actual job running... My 3 VMs are server and i don't have any UI to check the job is really pending.
For the actuals logs i get in mapred-mapred-historyserver-datanode1.out
14/11/18 17:32:34 INFO hs.JobHistory: History Cleaner started 14/11/18 17:32:34 INFO hs.JobHistory: History Cleaner complete 14/11/18 17:35:04 INFO hs.JobHistory: Starting scan to move intermediate done files 14/11/18 17:38:04 INFO hs.JobHistory: Starting scan to move intermediate done files 14/11/18 17:41:04 INFO hs.JobHistory: Starting scan to move intermediate done files 14/11/18 17:44:04 INFO hs.JobHistory: Starting scan to move intermediate done files
(I interupted the first Job and tried to run it a second time that's why time isn't matching)
Every 3 minutes it does this...(and i tried it for one night, it just keep going)
I don't know what is wrong. I thought i had an hint in configuring minimum and maximum memory size used (because my total config is only 6CPU 8GoRAM)... But it actualy change nothing
If you have any idea of what is wrong or where can I look to find the problem, I would be grateful to any information.
Cheers
Created 11-28-2014 07:06 AM
Thanks for your replies.
I did not touch the cluster for few days. Now i fixed the problem. May be the memory settings helped a little : i installed node manager also on my Name Node and fixed key: yarn.nodemanager.resource.memory-mb to 3GB but actually set the max container size to 2GB.
It appears in the logs of Node Manager i had a trouble with this :
<property>
<description>Where to aggregate logs to.</description>
<name>yarn.nodemanager.remote-app-log-dir</name>
<value>hdfs://namenode.*****.com:8020</value>
</property>
so i just let it as default by deleting this key setting. Now it works !
See the beautiful result of my Pi example
asimon@namenode:/etc/hadoop/conf$ sudo -u mapred hadoop jar /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples-2.5.0-cdh5.2.0.jar pi 2 2 Number of Maps = 2 Samples per Map = 2 Wrote input for Map #0 Wrote input for Map #1 Starting Job 14/11/28 15:27:31 INFO client.RMProxy: Connecting to ResourceManager at namenode.beijaflore.com/192.168.54.111:8032 14/11/28 15:27:31 INFO input.FileInputFormat: Total input paths to process : 2 14/11/28 15:27:31 INFO mapreduce.JobSubmitter: number of splits:2 14/11/28 15:27:31 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1417184716260_0001 14/11/28 15:27:32 INFO impl.YarnClientImpl: Submitted application application_1417184716260_0001 14/11/28 15:27:32 INFO mapreduce.Job: The url to track the job: http://namenode.beijaflore.com:8088/proxy/application_1417184716260_0001/ 14/11/28 15:27:32 INFO mapreduce.Job: Running job: job_1417184716260_0001 14/11/28 15:27:41 INFO mapreduce.Job: Job job_1417184716260_0001 running in uber mode : false 14/11/28 15:27:41 INFO mapreduce.Job: map 0% reduce 0% 14/11/28 15:27:46 INFO mapreduce.Job: map 50% reduce 0% 14/11/28 15:27:53 INFO mapreduce.Job: map 100% reduce 0% 14/11/28 15:27:54 INFO mapreduce.Job: map 100% reduce 100% 14/11/28 15:27:55 INFO mapreduce.Job: Job job_1417184716260_0001 completed successfully 14/11/28 15:27:55 INFO mapreduce.Job: Counters: 49 File System Counters FILE: Number of bytes read=50 FILE: Number of bytes written=306673 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 HDFS: Number of bytes read=560 HDFS: Number of bytes written=215 HDFS: Number of read operations=11 HDFS: Number of large read operations=0 HDFS: Number of write operations=3 Job Counters Launched map tasks=2 Launched reduce tasks=1 Data-local map tasks=2 Total time spent by all maps in occupied slots (ms)=108368 Total time spent by all reduces in occupied slots (ms)=44920 Total time spent by all map tasks (ms)=13546 Total time spent by all reduce tasks (ms)=5615 Total vcore-seconds taken by all map tasks=13546 Total vcore-seconds taken by all reduce tasks=5615 Total megabyte-seconds taken by all map tasks=13871104 Total megabyte-seconds taken by all reduce tasks=5749760 Map-Reduce Framework Map input records=2 Map output records=4 Map output bytes=36 Map output materialized bytes=56 Input split bytes=324 Combine input records=0 Combine output records=0 Reduce input groups=2 Reduce shuffle bytes=56 Reduce input records=4 Reduce output records=0 Spilled Records=8 Shuffled Maps =2 Failed Shuffles=0 Merged Map outputs=2 GC time elapsed (ms)=142 CPU time spent (ms)=1800 Physical memory (bytes) snapshot=678866944 Virtual memory (bytes) snapshot=3176591360 Total committed heap usage (bytes)=454430720 Shuffle Errors BAD_ID=0 CONNECTION=0 IO_ERROR=0 WRONG_LENGTH=0 WRONG_MAP=0 WRONG_REDUCE=0 File Input Format Counters Bytes Read=236 File Output Format Counters Bytes Written=97 Job Finished in 24.739 seconds Estimated value of Pi is 4.00000000000000000000
Thank you again for your replies.. I'll continue to hang around here. Especialy when i get in trouble !
Miraculous
Created 11-20-2014 05:09 PM
Miraculous,
How have you setup your NM's: how much memeory and cores are avaialble on the nodes?
This sounds like you do not have enough space to start the containers.
Give the map and reduce conatiners 1GB and 1vcore to start with. Check the NM config values: yarn.nodemanager.resource.memory-mb and yarn.nodemanager.resource.cpu-vcores settings.
Retrry it after that.
Wilfred
Created 11-24-2014 03:03 AM
Thank you WilfriedS for your answer.
I have the following resources :
2vCPU+2GovRAM on each of my 2 data nodes and
2vCPU+4GovRAM on the NameNode
yarn-site.xml config is :
<configuration> <property> <name>yarn.nodemanager.resource.memory-mb</name> <value>1024</value> </property> <property> <name>yarn.nodemanager.resource.cpu-vcores</name> <value>1</value> <description>Number of CPU cores that can be allocated for containers.</description> </property> <property> <name>yarn.scheduler.minimum-allocation-mb</name> <value>128</value> <description>Minimum limit of memory to allocate to each container request at the Resource Manager.</description> </property> <property> <name>yarn.scheduler.maximum-allocation-mb</name> <value>2048</value> <description>Maximum limit of memory to allocate to each container request at the Resource Manager.</description> </property> <property> <name>yarn.scheduler.minimum-allocation-vcores</name> <value>1</value> <description>The minimum allocation for every container request at the RM, in terms of virtual CPU cores. Requests lower than this won't take effect, and the specified value will get allocated the minimum.</description> </property> <property> <name>yarn.scheduler.maximum-allocation-vcores</name> <value>2</value> <description>The maximum allocation for every container request at the RM, in terms of virtual CPU cores. Requests higher than this won't take effect, and will get capped to this value.</description> </property> <property> <name>yarn.resourcemanager.hostname</name> <value>namenode.beijaflore.com</value> </property> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <property> <name>yarn.nodemanager.aux-services.mapreduce_shuffle.class</name> <value>org.apache.hadoop.mapred.ShuffleHandler</value> </property> <property> <name>yarn.log-aggregation-enable</name> <value>true</value> </property> <property> <description>List of directories to store localized files in.</description> <name>yarn.nodemanager.local-dirs</name> <value>file:///var/lib/hadoop-yarn/cache/${user.name}/nm-local-dir</value> </property> <property> <description>Where to store container logs.</description> <name>yarn.nodemanager.log-dirs</name> <value>file:///var/log/hadoop-yarn/containers</value> </property> <property> <description>Where to aggregate logs to.</description> <name>yarn.nodemanager.remote-app-log-dir</name> <value>hdfs://var/log/hadoop-yarn/apps</value> </property> <property> <description>Classpath for typical applications.</description> <name>yarn.application.classpath</name> <value> $HADOOP_CONF_DIR, $HADOOP_COMMON_HOME/*,$HADOOP_COMMON_HOME/lib/*, $HADOOP_HDFS_HOME/*,$HADOOP_HDFS_HOME/lib/*, $HADOOP_MAPRED_HOME/*,$HADOOP_MAPRED_HOME/lib/*, $HADOOP_YARN_HOME/*,$HADOOP_YARN_HOME/lib/* </value> </property> </configuration>
Then i restarted yarn and jobhistory services on the 3 nodes and tried the job again.. Same bug.
I would like to add that once i started the job, i can list and kill it with hadoop job -list and -kill
But he loop in jobhistory continue to go on every 3 minutes -> INFO hs.JobHistory: Starting scan to move intermediate done files.
Even when i restart services the jobhistory log restart repeating... How do i stop it to run?
/var/log/hadoop-yarn/containers is an empty directory..
Is it that i need more memory on the node manager?
Thanks
Created 11-24-2014 09:48 PM
Created 11-28-2014 07:06 AM
Thanks for your replies.
I did not touch the cluster for few days. Now i fixed the problem. May be the memory settings helped a little : i installed node manager also on my Name Node and fixed key: yarn.nodemanager.resource.memory-mb to 3GB but actually set the max container size to 2GB.
It appears in the logs of Node Manager i had a trouble with this :
<property>
<description>Where to aggregate logs to.</description>
<name>yarn.nodemanager.remote-app-log-dir</name>
<value>hdfs://namenode.*****.com:8020</value>
</property>
so i just let it as default by deleting this key setting. Now it works !
See the beautiful result of my Pi example
asimon@namenode:/etc/hadoop/conf$ sudo -u mapred hadoop jar /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples-2.5.0-cdh5.2.0.jar pi 2 2 Number of Maps = 2 Samples per Map = 2 Wrote input for Map #0 Wrote input for Map #1 Starting Job 14/11/28 15:27:31 INFO client.RMProxy: Connecting to ResourceManager at namenode.beijaflore.com/192.168.54.111:8032 14/11/28 15:27:31 INFO input.FileInputFormat: Total input paths to process : 2 14/11/28 15:27:31 INFO mapreduce.JobSubmitter: number of splits:2 14/11/28 15:27:31 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1417184716260_0001 14/11/28 15:27:32 INFO impl.YarnClientImpl: Submitted application application_1417184716260_0001 14/11/28 15:27:32 INFO mapreduce.Job: The url to track the job: http://namenode.beijaflore.com:8088/proxy/application_1417184716260_0001/ 14/11/28 15:27:32 INFO mapreduce.Job: Running job: job_1417184716260_0001 14/11/28 15:27:41 INFO mapreduce.Job: Job job_1417184716260_0001 running in uber mode : false 14/11/28 15:27:41 INFO mapreduce.Job: map 0% reduce 0% 14/11/28 15:27:46 INFO mapreduce.Job: map 50% reduce 0% 14/11/28 15:27:53 INFO mapreduce.Job: map 100% reduce 0% 14/11/28 15:27:54 INFO mapreduce.Job: map 100% reduce 100% 14/11/28 15:27:55 INFO mapreduce.Job: Job job_1417184716260_0001 completed successfully 14/11/28 15:27:55 INFO mapreduce.Job: Counters: 49 File System Counters FILE: Number of bytes read=50 FILE: Number of bytes written=306673 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 HDFS: Number of bytes read=560 HDFS: Number of bytes written=215 HDFS: Number of read operations=11 HDFS: Number of large read operations=0 HDFS: Number of write operations=3 Job Counters Launched map tasks=2 Launched reduce tasks=1 Data-local map tasks=2 Total time spent by all maps in occupied slots (ms)=108368 Total time spent by all reduces in occupied slots (ms)=44920 Total time spent by all map tasks (ms)=13546 Total time spent by all reduce tasks (ms)=5615 Total vcore-seconds taken by all map tasks=13546 Total vcore-seconds taken by all reduce tasks=5615 Total megabyte-seconds taken by all map tasks=13871104 Total megabyte-seconds taken by all reduce tasks=5749760 Map-Reduce Framework Map input records=2 Map output records=4 Map output bytes=36 Map output materialized bytes=56 Input split bytes=324 Combine input records=0 Combine output records=0 Reduce input groups=2 Reduce shuffle bytes=56 Reduce input records=4 Reduce output records=0 Spilled Records=8 Shuffled Maps =2 Failed Shuffles=0 Merged Map outputs=2 GC time elapsed (ms)=142 CPU time spent (ms)=1800 Physical memory (bytes) snapshot=678866944 Virtual memory (bytes) snapshot=3176591360 Total committed heap usage (bytes)=454430720 Shuffle Errors BAD_ID=0 CONNECTION=0 IO_ERROR=0 WRONG_LENGTH=0 WRONG_MAP=0 WRONG_REDUCE=0 File Input Format Counters Bytes Read=236 File Output Format Counters Bytes Written=97 Job Finished in 24.739 seconds Estimated value of Pi is 4.00000000000000000000
Thank you again for your replies.. I'll continue to hang around here. Especialy when i get in trouble !
Miraculous
Created 11-30-2014 04:58 AM
Created 06-05-2017 12:26 PM