Hadoop can be deployed on a variety of scales. The requirements at each of these will be different. Hadoop has a large number of tunable parameters that can be used to influence its operation. Furthermore, there are a number of other technologies which can be deployed with Hadoop for additional capabilities. Performance Monitoring In Hadoop Multiple tools exist to monitor large clusters for performance and troubleshooting. This section briefly highlights two such tools. Ganglia is a performance monitoring framework for distributed systems. Ganglia provides a distributed service which collects metrics on individual machines and forwards them to an aggregator which can report back to an administrator on the global state of a cluster. Ganglia is designed to be integrated into other applications to collect statistics about their operation. Hadoop includes a performance monitoring framework which can use Ganglia as its backend. Instructions are available on the Hadoop wiki as to how to enable Ganglia metrics in Hadoop. Instructions are also included below. After installing and configuring Ganglia on your cluster, to direct Hadoop to output its metric reports to Ganglia, create a file named hadoop-metrics.properties in the $HADOOP_HOME/conf directory. The file should have the following contents: dfs.class=org.apache.hadoop.metrics.ganglia.GangliaContextdfs.period=10dfs.servers=localhost:8649mapred.class=org.apache.hadoop.metrics.ganglia.GangliaContextmapred.period=10mapred.servers=localhost:8649
This assumes that gmond is running on each machine in the cluster. Instructions on the Hadoop wiki note that (in the experience of the wiki article author) this may result in all nodes reporting their results as "localhost" instead of with their individual hostnames. If this problem affects your cluster, an alternate configuration is proposed, in which all Hadoop instances speak directly with gmetad: dfs.class=org.apache.hadoop.metrics.ganglia.GangliaContextdfs.period=10dfs.servers=@GMETAD@:8650mapred.class=org.apache.hadoop.metrics.ganglia.GangliaContextmapred.period=10mapred.servers=@GMETAD@:8650
Where @GMETAD@ is the hostname of the server on which the gmetad service is running. If deploying Ganglia and Hadoop on a very large number of machines, the impact of this configuration (vs. the standard Ganglia configuration where individual services talk to gmond on localhost) should be evaluated. Nagios is a machine and service monitoring system designed for large clusters. Nagios will provide useful diagnostic information for tuning your cluster, including network, disk, and CPU utilization across machines. Additional Tips The following are a few additional pieces of small advice:
Create a separate user named "hadoop" to run your instances; this will separate the Hadoop processes from any users on the system. Do not run Hadoop as root. If Hadoop is installed in /home/hadoop/hadoop-0.18.0, link /home/hadoop/hadoop to /home/hadoop/hadoop-0.18.0. When upgrading to a newer version in the future, the link can be moved to make this process easier on other scripts that depend on the hadoop/bin directory.
... View more
I have developed a custom Solr search component for which I need to write unit tests. As I have seen in the code of other Solr components, writing unit tests in Solr is done by extending the SolrTestCaseJ4 class. Unfortunately, SolrTestCaseJ4 doesn't deal with testing in a distributed setting, and my custom component works only in such a setting. As a matter of fact, my component deliberately returns empty responses when not in a distributed setting.
I'm trying to think of a way to use the BaseDistributedSearchTestCase class to test my component. The problem with BaseDistributedSearchTestCase is that how it works won't solve my issue. When using BaseDistributedSearchTestCase you define a single test method where you index all the documents and perform some queries. Running the tests executes the requests both on a distributed setting and on a single core setting. It then compares the responses of each setting to verify their equality. I cannot explicitly assert anything in that flow. How do I write unit tests for a Solr distributed component?
... View more