Created on 10-03-2013 09:38 PM - edited 09-16-2022 01:48 AM
hi
I am installing CDH4.4 on Centos 6.3 (visual machine , 2vcpu/2GB MEM/20GB) using Cloudera Manager through installation path A, I have two problems which has troubled me for a long time:
1¡¢ The first implementation of the "create a temporary directory" will always fail ,no exception log in the /var/run/cloudera-scm-agent/process/17-hdfs-NAMENODE-createtmp/logs/stderr.log , I usually try again later will be successful .
I refer to the relevant information in https://groups.google.com/a/cloudera.org/forum/#!topic/cdh-user/jlgfjQy56So ,and add 127.0.1.1 to my /etc/hosts,but the fault is still in the first time.
2¡¢When I perform to the 'starting Cloudera Management Services' step , my visual machine continued high IOPS ,this cause the virtual machine stop responding and unable to complete the following installation.
I execute the command "vmstat 2" ,that found more than 30 processes are waiting for the CPU scheduling and the following errors were found in /var/log/cloudera-scm-agent/cloudera-scm-agent.log
03/Oct/2013 21:07:23 +0000] 32181 Monitor-HostMonitor throttling_logger ERROR (315 skipped) Failed to collect java-based DNS names
Traceback (most recent call last):
File "/usr/lib64/cmf/agent/src/cmf/monitor/host/dns_names.py", line 53, in collect
result, stdout, stderr = self._subprocess_with_timeout(args, self._poll_timeout)
File "/usr/lib64/cmf/agent/src/cmf/monitor/host/dns_names.py", line 42, in _subprocess_with_timeout
return subprocess_with_timeout(args, timeout)
File "/usr/lib64/cmf/agent/src/cmf/monitor/host/subprocess_timeout.py", line 40, in subprocess_with_timeout
close_fds=True)
File "/usr/lib64/python2.6/subprocess.py", line 639, in __init__
errread, errwrite)
File "/usr/lib64/python2.6/subprocess.py", line 1228, in _execute_child
raise child_exception
OSError: [Errno 2] No such file or directory
[03/Oct/2013 21:07:50 +0000] 32181 Monitor-DataNodeMonitor abstract_monitor ERROR Error fetching metrics at 'http://cdh1.jsnewland.com:50075/jmx'
Traceback (most recent call last):
File "/usr/lib64/cmf/agent/src/cmf/monitor/abstract_monitor.py", line 252, in collect_metrics_from_url
openedUrl = self.urlopen(url, username=username, password=password)
File "/usr/lib64/cmf/agent/src/cmf/monitor/abstract_monitor.py", line 234, in urlopen
password=password)
File "/usr/lib64/cmf/agent/src/cmf/url_util.py", line 39, in urlopen_with_timeout
return opener.open(url, data, timeout)
File "/usr/lib64/python2.6/urllib2.py", line 391, in open
response = self._open(req, data)
File "/usr/lib64/python2.6/urllib2.py", line 409, in _open
'_open', req)
File "/usr/lib64/python2.6/urllib2.py", line 369, in _call_chain
result = func(*args)
File "/usr/lib64/python2.6/urllib2.py", line 1190, in http_open
return self.do_open(httplib.HTTPConnection, req)
File "/usr/lib64/python2.6/urllib2.py", line 1165, in do_open
raise URLError(err)
URLError: <urlopen error timed out>
[03/Oct/2013 21:07:50 +0000] 32181 Monitor-NameNodeMonitor abstract_monitor ERROR Error fetching metrics at 'http://cdh1.jsnewland.com:50070/jmx'
Traceback (most recent call last):
File "/usr/lib64/cmf/agent/src/cmf/monitor/abstract_monitor.py", line 252, in collect_metrics_from_url
openedUrl = self.urlopen(url, username=username, password=password)
File "/usr/lib64/cmf/agent/src/cmf/monitor/abstract_monitor.py", line 234, in urlopen
password=password)
File "/usr/lib64/cmf/agent/src/cmf/url_util.py", line 39, in urlopen_with_timeout
return opener.open(url, data, timeout)
File "/usr/lib64/python2.6/urllib2.py", line 391, in open
response = self._open(req, data)
File "/usr/lib64/python2.6/urllib2.py", line 409, in _open
'_open', req)
File "/usr/lib64/python2.6/urllib2.py", line 369, in _call_chain
result = func(*args)
File "/usr/lib64/python2.6/urllib2.py", line 1190, in http_open
return self.do_open(httplib.HTTPConnection, req)
File "/usr/lib64/python2.6/urllib2.py", line 1165, in do_open
raise URLError(err)
URLError: <urlopen error timed out>
[03/Oct/2013 21:07:56 +0000] 32181 MainThread agent ERROR Heartbeating to 192.168.125.135:7182 failed.
Traceback (most recent call last):
File "/usr/lib64/cmf/agent/src/cmf/agent.py", line 747, in send_heartbeat
response = self.requestor.request('heartbeat', dict(request=heartbeat))
File "/usr/lib64/cmf/agent/build/env/lib/python2.6/site-packages/avro-1.6.3-py2.6.egg/avro/ipc.py", line 145, in request
return self.issue_request(call_request, message_name, request_datum)
File "/usr/lib64/cmf/agent/build/env/lib/python2.6/site-packages/avro-1.6.3-py2.6.egg/avro/ipc.py", line 256, in issue_request
call_response = self.transceiver.transceive(call_request)
File "/usr/lib64/cmf/agent/build/env/lib/python2.6/site-packages/avro-1.6.3-py2.6.egg/avro/ipc.py", line 485, in transceive
result = self.read_framed_message()
File "/usr/lib64/cmf/agent/build/env/lib/python2.6/site-packages/avro-1.6.3-py2.6.egg/avro/ipc.py", line 489, in read_framed_message
response = self.conn.getresponse()
File "/usr/lib64/python2.6/httplib.py", line 990, in getresponse
response.begin()
File "/usr/lib64/python2.6/httplib.py", line 391, in begin
version, status, reason = self._read_status()
File "/usr/lib64/python2.6/httplib.py", line 349, in _read_status
line = self.fp.readline()
File "/usr/lib64/python2.6/socket.py", line 433, in readline
data = recv(1)
timeout: timed out
Although the log display is the connection timed out , but I can still access the URL http://cdh1.jsnewland.com:50075/jmx through Mozilla firework on the host cdh1 , the browser will return some JSON format information.
Thanks.
Created 10-04-2013 09:35 AM
In your hosts file; do not comment out the loopback interface (127.0.0.1) just let that be its normal values, you can allow the ipv6 value to be set as well, it is not necessary to comment either of those out.
From your command line in the shell, do a "getent hosts cdh1.jsnewland.com" and "getent hosts 192.168.125.135" to verify name resolution is doing what you want. If it comes back with unexpected values, verify in your vm's that /etc/nsswitch.conf is set for "hosts files dns" in that order, rather than "hosts dns files".
What is the host OS you are using for the VM? If you had a 8GB system, you would be much better off running a single 3 to 4 GB VM. You need to realize the parent OS (especially if GUI desktop is in use) is going to need memory, including overhead to run the actual vm servers and instances.
At this scale of physical system (6GB RAM); attempting to emulate a cluster of 3 x 2GB nodes is going to get in the way of your attempt to use hadoop. Take a look at our Example VM that is available for download, it's set up to run in a laptop/desktop configuration. The sample vm uses 4GB as its base memory configuration.
For the vmstat information you provided, here is the breakdown of what it is telling you:
The attached file vmstat-test.txt is a test of making a path 12 times on a VM with 12GB RAM with 6GB swap configured, on a physical host with 128GB ram. Note the differences from your output.
Note in the explanation of the column titles for vmstat, my tag of "<---" below indicate what you should focus on when evaluating vmstat output. Compare your vmstat to the test I did in the attached file.
You are heavily swapping. It's not a question of being out of swap (it would crash at that point), its the volume of activity of paging back and forth that is literally choking the VM. Below is your vmstat re-pasted:
# vmstat 2
procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu-----
r b swpd free buff cache si so bi bo in cs us sy id wa st
2 21 913364 52192 640 20384 212 411 673 441 266 521 8 3 67 22 0
1 23 912360 60864 496 19840 1398 572 1912 576 356 578 8 3 0 90 0
0 17 911292 57136 504 19892 2200 304 2256 304 340 621 4 3 0 93 0
1 17 909904 50180 536 22032 1530 18 2600 18 376 630 7 5 0 88 0
1 15 908268 49372 536 23460 1906 22 2614 22 341 643 3 3 0 95 0
2 19 906812 49084 544 25304 1838 0 3014 0 328 778 3 5 0 92 0
0 15 906036 49152 532 26032 1582 220 2500 220 297 540 3 5 0 92 0
3 16 908180 62844 536 23092 1460 2220 2286 2220 477 591 18 8 0 74 0
2 12 906608 58860 536 25644 1830 0 3120 0 440 603 10 11 0 79 0
3 21 904808 53412 536 26244 2370 0 2668 0 578 767 5 9 0 86 0
Now to understand how to read the vmstat output.
Procs
r: The number of processes waiting for run time.
b: The number of processes in uninterruptible sleep. <<---
Memory
swpd: the amount of virtual memory used. <<----
free: the amount of idle memory.
buff: the amount of memory used as buffers.
cache: the amount of memory used as cache.
inact: the amount of inactive memory. (-a option)
active: the amount of active memory. (-a option)
Swap
si: Amount of memory swapped in from disk (/s). <<<----
so: Amount of memory swapped to disk (/s). <<<----
IO
bi: Blocks received from a block device (blocks/s).
bo: Blocks sent to a block device (blocks/s).
System
in: The number of interrupts per second, including the clock. <----
cs: The number of context switches per second. <----
CPU
These are percentages of total CPU time.
us: Time spent running non-kernel code. (user time, including nice time)
sy: Time spent running kernel code. (system time)
id: Time spent idle. Prior to Linux 2.5.41, this includes IO-wait time.
wa: Time spent waiting for IO. Prior to Linux 2.5.41, included in idle. <<<----
st: Time stolen from a virtual machine. Prior to Linux 2.6.11, unknown.
Created 10-04-2013 09:33 AM
(pasted from mail thread discussion)
2GB is going to be tough to prevent swapping of the vm back and for between disk and ram... how much physical ram is available on the machine you are running the VM on? We run with 4GB in the demo VM (that might be worth downloading and using to check things out).
Also what does the following commands show in your VM.
# hostname
and
# ifconfig -a
and
# cat /etc/hosts
Created 10-04-2013 09:34 AM
Created 10-04-2013 09:35 AM
In your hosts file; do not comment out the loopback interface (127.0.0.1) just let that be its normal values, you can allow the ipv6 value to be set as well, it is not necessary to comment either of those out.
From your command line in the shell, do a "getent hosts cdh1.jsnewland.com" and "getent hosts 192.168.125.135" to verify name resolution is doing what you want. If it comes back with unexpected values, verify in your vm's that /etc/nsswitch.conf is set for "hosts files dns" in that order, rather than "hosts dns files".
What is the host OS you are using for the VM? If you had a 8GB system, you would be much better off running a single 3 to 4 GB VM. You need to realize the parent OS (especially if GUI desktop is in use) is going to need memory, including overhead to run the actual vm servers and instances.
At this scale of physical system (6GB RAM); attempting to emulate a cluster of 3 x 2GB nodes is going to get in the way of your attempt to use hadoop. Take a look at our Example VM that is available for download, it's set up to run in a laptop/desktop configuration. The sample vm uses 4GB as its base memory configuration.
For the vmstat information you provided, here is the breakdown of what it is telling you:
The attached file vmstat-test.txt is a test of making a path 12 times on a VM with 12GB RAM with 6GB swap configured, on a physical host with 128GB ram. Note the differences from your output.
Note in the explanation of the column titles for vmstat, my tag of "<---" below indicate what you should focus on when evaluating vmstat output. Compare your vmstat to the test I did in the attached file.
You are heavily swapping. It's not a question of being out of swap (it would crash at that point), its the volume of activity of paging back and forth that is literally choking the VM. Below is your vmstat re-pasted:
# vmstat 2
procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu-----
r b swpd free buff cache si so bi bo in cs us sy id wa st
2 21 913364 52192 640 20384 212 411 673 441 266 521 8 3 67 22 0
1 23 912360 60864 496 19840 1398 572 1912 576 356 578 8 3 0 90 0
0 17 911292 57136 504 19892 2200 304 2256 304 340 621 4 3 0 93 0
1 17 909904 50180 536 22032 1530 18 2600 18 376 630 7 5 0 88 0
1 15 908268 49372 536 23460 1906 22 2614 22 341 643 3 3 0 95 0
2 19 906812 49084 544 25304 1838 0 3014 0 328 778 3 5 0 92 0
0 15 906036 49152 532 26032 1582 220 2500 220 297 540 3 5 0 92 0
3 16 908180 62844 536 23092 1460 2220 2286 2220 477 591 18 8 0 74 0
2 12 906608 58860 536 25644 1830 0 3120 0 440 603 10 11 0 79 0
3 21 904808 53412 536 26244 2370 0 2668 0 578 767 5 9 0 86 0
Now to understand how to read the vmstat output.
Procs
r: The number of processes waiting for run time.
b: The number of processes in uninterruptible sleep. <<---
Memory
swpd: the amount of virtual memory used. <<----
free: the amount of idle memory.
buff: the amount of memory used as buffers.
cache: the amount of memory used as cache.
inact: the amount of inactive memory. (-a option)
active: the amount of active memory. (-a option)
Swap
si: Amount of memory swapped in from disk (/s). <<<----
so: Amount of memory swapped to disk (/s). <<<----
IO
bi: Blocks received from a block device (blocks/s).
bo: Blocks sent to a block device (blocks/s).
System
in: The number of interrupts per second, including the clock. <----
cs: The number of context switches per second. <----
CPU
These are percentages of total CPU time.
us: Time spent running non-kernel code. (user time, including nice time)
sy: Time spent running kernel code. (system time)
id: Time spent idle. Prior to Linux 2.5.41, this includes IO-wait time.
wa: Time spent waiting for IO. Prior to Linux 2.5.41, included in idle. <<<----
st: Time stolen from a virtual machine. Prior to Linux 2.6.11, unknown.
Created 10-04-2013 09:36 AM
(text from the attached file "vmstat-test.txt" - from the mail thread)
[root@cehd3 ~]# for i in {1..12}; do date ; echo "sudo -u hdfs hadoop fs -mkdir /foo$i";sudo -u hdfs hadoop fs -mkdir /foo$i; done
Fri Oct 4 09:57:24 MDT 2013
sudo -u hdfs hadoop fs -mkdir /foo1
Fri Oct 4 09:57:26 MDT 2013
sudo -u hdfs hadoop fs -mkdir /foo2
Fri Oct 4 09:57:27 MDT 2013
sudo -u hdfs hadoop fs -mkdir /foo3
Fri Oct 4 09:57:29 MDT 2013
sudo -u hdfs hadoop fs -mkdir /foo4
Fri Oct 4 09:57:31 MDT 2013
sudo -u hdfs hadoop fs -mkdir /foo5
Fri Oct 4 09:57:33 MDT 2013
sudo -u hdfs hadoop fs -mkdir /foo6
Fri Oct 4 09:57:35 MDT 2013
sudo -u hdfs hadoop fs -mkdir /foo7
Fri Oct 4 09:57:36 MDT 2013
sudo -u hdfs hadoop fs -mkdir /foo8
Fri Oct 4 09:57:38 MDT 2013
sudo -u hdfs hadoop fs -mkdir /foo9
Fri Oct 4 09:57:40 MDT 2013
sudo -u hdfs hadoop fs -mkdir /foo10
Fri Oct 4 09:57:42 MDT 2013
sudo -u hdfs hadoop fs -mkdir /foo11
Fri Oct 4 09:57:44 MDT 2013
sudo -u hdfs hadoop fs -mkdir /foo12
procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------ ---timestamp---
r b swpd free buff cache si so bi bo in cs us sy id wa st
0 0 0 10875508 261608 560260 0 0 0 5 17 9 0 0 99 0 0 2013-10-04 09:57:22 MDT
1 0 0 10857096 261608 560260 0 0 0 42 299 519 8 1 91 0 0 2013-10-04 09:57:24 MDT
2 0 0 10842448 261608 560292 0 0 0 0 1150 845 94 5 1 0 0 2013-10-04 09:57:26 MDT
1 0 0 10835436 261608 560292 0 0 0 0 1051 838 96 4 0 0 0 2013-10-04 09:57:28 MDT
3 0 0 10826628 261608 560292 0 0 0 44 1105 874 92 7 1 0 0 2013-10-04 09:57:30 MDT
3 0 0 10820764 261608 560296 0 0 0 0 1058 858 96 4 0 0 0 2013-10-04 09:57:32 MDT
2 0 0 10815676 261608 560332 0 0 0 2 1118 899 94 6 1 0 0 2013-10-04 09:57:34 MDT
3 0 0 10794528 261608 560300 0 0 0 34 1118 838 95 5 0 0 0 2013-10-04 09:57:36 MDT
1 0 0 10777340 261608 560300 0 0 0 22 1086 823 95 4 1 0 0 2013-10-04 09:57:38 MDT
4 0 0 10864748 261608 560300 0 0 0 20 1123 964 93 7 1 0 0 2013-10-04 09:57:40 MDT
1 0 0 10849984 261608 560300 0 0 0 28 1027 791 95 5 0 0 0 2013-10-04 09:57:42 MDT
3 0 0 10829008 261608 560300 0 0 0 0 1037 946 93 6 1 0 0 2013-10-04 09:57:44 MDT
Created 10-11-2013 01:06 AM