Member since
02-07-2019
1792
Posts
1
Kudos Received
0
Solutions
12-10-2019
08:01 AM
Sometimes, a node needs to be decommissioned or have undetermined downtime for repairs. If the node has a Resource Manager, move it to a new host using the Resource Manager Move Wizard from Ambari Web User Interface.The Resource Manager Move wizard describes the set of automated steps to be taken to move one Resource Manager to new host. Since YARN and Mapreduce2 will be restarted, a cluster maintenance windows must be planned and be prepared for cluster downtime.
This video describes how to move a Resource Manager to a new host using the Resource Manager Move Wizard from Ambari Web User Interface.
Open the video on YouTube here
To move YARN Resource Manager to a new host with Ambari, do the following:
In Ambari Web, browse to Services > YARN > Summary.
Select Service Actions and choose Move ResourceManager. The Move ResourceManager wizard launches, describing a set of automated steps that must be followed to move one ResourceManager to a new host.
Click Get Started. This wizard will provide a walk- through to move the ResourceManager.
The following services will be restarted as part of the wizard:
You should plan a cluster maintenance window and prepare for cluster downtime when moving ResourceManager.
YARN
MAPREDUCE2.
Click Next
Select the Target Host. Assign ResourceManager to new host. Click next.
Review and confirm the host selections.
Expand YARN if necessary, to review all the configuration changes proposed for YARN.
Click Deploy to approve the changes and start automatically moving the Resource Manager to a new host.
On Configure Components, click Complete when all the progress bars are completed.
After reloading the Ambari Web reloads, there will be some alerts. Wait a few minutes until all the services restart.
Restart any components using Ambari Web, if necessary.
REFERENCE:
http://docs.hortonworks.com (official product documentation) http://community.hortonworks.com (community forum)
... View more
Labels:
12-10-2019
08:00 AM
If an active Resource Manager in a cluster fails, to ensure that another Resource Manager is available, the Resource Manager high availability (HA) should be enabled and configured.
In HDP 2.2 or later environment, high availability (HA) can be configured for ResourceManager by using the Enable ResourceManager HA wizard. To ensure this, there must be at least three hosts in the cluster and Apache ZooKeeper servers should be running.
The Enable ResourceManager high availability section from the documentation contains the steps mentioned in this video.
Open the video on YouTube here Recommended links:
Product documentation page
Community Forum
... View more
Labels:
12-10-2019
07:56 AM
To ensure that another NameNode in a cluster is always available when an active NameNode host fails, the NameNode high availability should be enabled and configured on the cluster from the Ambari Web User Interface.
This video explains how to enable HA wizard and the steps that must be followed to set up NameNode high availability.
Open the YouTube video here
As a prerequisite, ensure the following:
If the HDFS or ZooKeeper services are in Maintenance Mode the NameNode HA wizard will not complete successfully.
HDFS and ZooKeeper must be stopped and started when enabling NameNode HA as the Maintenance Mode will prevent those start and stop operations from occurring.
The Enable NameNode high availability section from the documentation contains the steps mentioned in this video. Recommended links:
Product documentation page
Community Forum
... View more
Labels:
12-10-2019
07:55 AM
Apache Ranger is one of the easiest, robust and flexible framework to manage the authorization for different components of a cluster. However, if your policies are not syncing, that may become an issue.
In this video, we will be troubleshooting the Ranger policy synchronization. We will look into the components that are involved, the components that interact between them and a few other most common issues.
The video is to understand what are the main factors to check on a policy synchronization issue and how to solve them.
Open the video on YouTube here
... View more
Labels:
12-10-2019
07:53 AM
The Zookeeper transaction logs and Snapshots files are not human readable by default. Running a cat command on these files do not give clear information on the content of the files.
The following video explains how to read Zookeeper transaction logs and Snapshots?
Open the video on YouTube here
To view the content of these files, use the following: To read the snapshots:
java -cp /usr/hdp/current/zookeeper-server/zookeeper.jar:/usr/hdp/current/zookeeper-server/lib/*
org.apache.zookeeper.server.SnapshotFormatter <Snapshot file name>
To read the transaction logs:
java -cp /usr/hdp/current/zookeeper-server/zookeeper.jar:/usr/hdp/current/zookeeper-server/lib/*
org.apache.zookeeper.server.LogFormatter <Log file name>
The classes that need to be used are located under /usr/hdp/current/zookeeper-server and /usr/hdp/current/zookeeper-server/lib .
... View more
Labels:
12-10-2019
03:59 AM
This video article provides the steps on how to use Reassign Partitions Tool:
Open YouTube video here
Create a file named topics-to-move.json with the following content:
{
"topics": [{"topic":"<partitionName>"}],
"version":1
}
Run the following command:
./kafka-reassign-partitions.sh --zookeeper master:2181
--topics-to-move-json-file topics-to-move.json --broker-list "<brokerID>"
--generate
Take the output from previous Proposed partition reassignment configuration and create another json file reassign-partition.json.
Run the following command:
./kafka-reassign-partitions.sh --zookeeper master:2181
--reassignment-json-file reassign-partition.json --execute
... View more
Labels:
12-10-2019
03:55 AM
1 Kudo
At times, Kafka Brokers can find one of its log directory utilization at 100% and the broker process would fail to start.
This article provides the instructions to manually move partition data between different log directories within a Kafka Broker.
Open the video on YouTube here
The Kafka brokers maintain two offset checkpoint files inside each log directory:
replication-offset-checkpoint
recovery-point-offset-checkpoint
And both these files have the following format:
(a) 1st line: Version Number
(b) 2nd line: Number of topic-partition entries in the file
(c) All the remaining lines: Replication Offset/Recovery Point Offset, for every partition data maintained within the current log directory.
... View more
Labels:
12-10-2019
03:50 AM
Ambari Infra Solr can be present on HDP and HDF clusters. HDPSearch Solr is a separate product that can be installed on top of an HDP cluster.
This video talks about the goals and differences between them and how to correctly select the product category when opening a support case.
Open YouTube video here
... View more
Labels:
12-10-2019
03:49 AM
This articles describe the steps required to complete the setup for accessing Grafana using HTTPS with CA Signed Certificates.
Open YouTube video here
Ambari Metrics System includes Grafana, which is a daemon that runs on a specific host in the cluster and serves pre-built dashboards for visualising metrics collected in the Metrics Collector. For this article the following servers is used:
172.25.33.152 c3132-node1.user.local (Ambari Server)
172.25.36.9 c3132-node2.user.local (Ambari Metrics Collector + Grafana)
172.25.40.27 c3132-node3.user.local (Ambari Metrics Collector)
172.25.33.163 c3132-node4.user.local
By default, Grafana listen on port TCP/3000:
# for i in $(netstat -utnlp | awk '/grafana/ {print substr($7, 1, length($7)-13)}' |
sort -u) ; do echo ; ps -eo pid,user,command --cols 128 | grep $i | grep -v grep ;
netstat -utnlp | grep $i ; echo ; done
270925 ams /usr/lib/ambari-metrics-grafana/bin/grafana-server
--pidfile=/var/run/ambari-metrics-grafana/grafana-server.pid
tcp6 0 0 :::3000 :::*
LISTEN 270925/grafana-serv
Here, the process running is grafana-server, the owner is ams, and is listening on port TCP/3000. All the configurations for Grafana are handled by Ambari, and are reflected in the ams-grafana.ini file located at /etc/ambari-metrics-grafana/conf/ directory. Grafana needs to be restarted for any configuration changes to take effect. In enterprises where security is required, limit the Grafana access to only HTTPS connections. To enable https for Grafana, update the following properties:
AmbariUI / Services / Ambari Metrics / Configs -> Advanced ams-grafana-ini
protocol: By default, http. For this video we need to change this to https.
ca_cert: The path to CA root certificate or bundle to be used to validate the Grafana certificate against. Since we are using a PKCS#12 bundle certificate, we need to extract the CA certificate chain from it.
cert_file: The path to the certificate. This certificate nees to be in PEM format.
cert_key: The path for the private key that match with the public key of the certificate. This private key needs to be unencrypted RSA private key.
For this article, the CA will provide us with a certificate bundle located at:
/var/tmp/certificates/GRAFANA
Since the certificate information provided by the CA is a PKCS#12 certificate bundle, complete the following steps:
Extract the root and intermediate certificates, using the following command:
openssl pkcs12 -in c3132-node2.user.local.p12 -out ams-ca.crt -cacerts -nokeys
-passin pass:hadoop1234
Extract the server certificate:
openssl pkcs12 -in c3132-node2.user.local.p12 -out ams-grafana.crt -clcerts
-nokeys -passin pass:hadoop1234
Extract the private key:
openssl pkcs12 -in c3132-node2.user.local.p12 -nocerts -nodes -out ams-grafana.key
-passin pass:hadoop1234
Copy the certificates to a folder with ams user permissions. For this article, the default path and the default names the following:
cp ams-*.* /etc/ambari-metrics-grafana/conf/
chown ams:hadoop /etc/ambari-metrics-grafana/conf/ams-*.*
Update the Grafana configuration from Ambari:
AmbariUI / Services / Ambari Metrics / Configs -> Advanced ams-grafana-ini
protocol = https
ca_cert = /etc/ambari-metrics-grafana/conf/ams-ca.crt
cert_file = /etc/ambari-metrics-grafana/conf/ams-grafana.crt
cert_key = /etc/ambari-metrics-grafana/conf/ams-grafana.key
Save the changes, and restart all affected services.
Double-check the Grafana log file:
tail -f /var/log/ambari-metrics-grafana/grafana.log
The following will be the listening on port 3000 over HTTPS:
2018/12/12 03:42:41 [I] Listen: https://0.0.0.0:3000
Double-check the certificate in place using:
openssl s_client -connect c3132-node2.user.local:3000 </dev/null
Open Grafana from Ambari to validate if its working as expected:
AmbariUI / Services / Ambari Metrics / Summary -> Quick Link Grafana
Note: Ignore the warning and proceed.
With all these steps, Grafana is configured to be used with CA Signed certificates and the communication will be is over HTTPS.
... View more
Labels:
12-10-2019
03:42 AM
This video explains how to configure Ambari Metrics System AMS High Availability.
Open YouTube video here
To enable AMS high availability, the collector has to be configured to run in a distributed mode. When the Collector is configured for distributed mode, it writes metrics to HDFS, and the components run in distributed processes, which helps to manage CPU and memory.
The following steps assume a cluster configured for a highly available NameNode.
Set the HBase Root directory value to use the HDFS name service instead of the NameNode hostname.
Migrate existing data from the local store to HDFS prior to switching to a distributed mode.
To switch the Metrics Collector from embedded mode to distributed mode, update the Metrics Service operation mode and the location where the metrics are being stored. In summary, the following steps are required:
Stop Ambari Metrics System
Prepare the Environment to migrate from Local File System to HDFS
Migrate Collector Data to HDFS
Configure Distributed Mode using Ambari
Restart all affected and Monitoring Collector Log
Stop all the services associated with the AMS component using Ambari
AmbariUI / Services / Ambari Metrics / Summary / Action / Stop
Prepare the Environment to migrate from Local File System to HDFS
AMS_User=ams
AMS_Group=hadoop
AMS_Embedded_RootDir=$(grep -C 2 hbase.rootdir /etc/ambari-metrics-collector/conf/hbase-site.xml | awk -F"[<|>|:]" '/value/ {print $4}' | sed 's|//||1')
ActiveNN=$(su -l hdfs -c "hdfs haadmin -getAllServiceState | awk -F '[:| ]' '/active/ {print \$1}'")
NN_Port=$(su -l hdfs -c "hdfs haadmin -getAllServiceState | awk -F '[:| ]' '/active/ {print \$2}'")
HDFS_Name_Service=$(grep -A 1 dfs.nameservice /etc/hadoop/conf/hdfs-site.xml | awk -F"[<|>]" '/value/ {print $3}')
HDFS_AMS_PATH=/apps/ams/metrics
Create the folder for Collector data in HDFS
su -l hdfs -c "hdfs dfs -mkdir -p ${HDFS_AMS_PATH}"
su -l hdfs -c "hdfs dfs -chown ${AMS_User}:${AMS_Group} ${HDFS_AMS_PATH}"
Update permissions to be able to copy collector data from local file system to HDFS
namei -l ${AMS_Embedded_RootDir}/staging
chmod +rx ${AMS_Embedded_RootDir}/staging
Copy collector data from local file system to HDFS
su -l hdfs -c "hdfs dfs -copyFromLocal ${AMS_Embedded_RootDir} hdfs://${ActiveNN}
:${NN_Port}${HDFS_AMS_PATH}"
su - hdfs -c "hdfs dfs -chown -R ${AMS_User}:${AMS_Group} ${HDFS_AMS_PATH}"
Configure collector to distrubute mode using Ambari:
AmbariUI / Services / Ambari Metrics / Configs / Metrics Service operation mode =
distributed
AmbariUI / Services / Ambari Metrics / Configs / Advanced ams-hbase-site /
hbase.cluster.distributed = true
AmbariUI / Services / Ambari Metrics / Configs / Advanced ams-hbase-site /
HBase root directory = hdfs://AMSHA/apps/ams/metrics
AmbariUI / Services / HDFS / Configs / Custom core-site
hadoop.proxyuser.hdfs.groups = *
hadoop.proxyuser.root.groups = *
hadoop.proxyuser.hdfs.hosts = *
hadoop.proxyuser.root.hosts = *
AmbariUI / Services / HDFS / Configs / HDFS Short-circuit read /Advanced
hdfs-site = true (check)
AmbariUI -> Restart All required
Note: Impersonation is the ability to allow a service user to securely access data in Hadoop on behalf of another user. When proxy users is configured, any access using a proxy are executed with the impersonated user's existing privilege levels rather than those of a superuser, like HDFS. The behavior is similar when using proxy hosts. Basically, it limits the hosts from which impersonated connections are allowed. For this article and testing purposes, all users and all hosts are allowed.
Additionally, one of the key principles behind Apache Hadoop is the idea that moving computation is cheaper than moving data. With Short-Circuit Local Reads, since the client and the data are on the same node, there is no need for the DataNode to be in the data path. Rather, the client itself can simply read the data from the local disk improving performance
Once the AMS is up and running, in the Metrics Collector Log the following message is displayed:
2018-12-12 01:21:12,132 INFO org.eclipse.jetty.server.Server: Started @14700ms
2018-12-12 01:21:12,132 INFO org.apache.hadoop.yarn.webapp.WebApps: Web app timeline started at 6188
2018-12-12 01:21:40,633 INFO org.apache.ambari.metrics.core.timeline.availability.MetricCollectorHAController:
######################### Cluster HA state ########################
CLUSTER: ambari-metrics-cluster
RESOURCE: METRIC_AGGREGATORS
PARTITION: METRIC_AGGREGATORS_0 c3132-node2.user.local_12001 ONLINE
PARTITION: METRIC_AGGREGATORS_1 c3132-node2.user.local_12001 ONLINE
##################################################
According to above message, there a cluster with only one collector. The next logical step, will be adding an additional Collector from Ambari Server. To do this, run the following:
AmbariUI / Hosts / c3132-node3.user.local / Summary -> +ADD -> Metrics Collector
Note: c3132-node3.user.local is the node where you will be adding the Collector. Since distributed mode is already enabled, after adding the collector, start the service. Once the AMS is up and running, the following message is displayed in the Metrics Collector Log:
2018-12-12 01:34:56,060 INFO org.apache.ambari.metrics.core.timeline.availability.MetricCollectorHAController:
######################### Cluster HA state ########################
CLUSTER: ambari-metrics-cluster
RESOURCE: METRIC_AGGREGATORS
PARTITION: METRIC_AGGREGATORS_0 c3132-node2.user.local_12001 ONLINE
PARTITION: METRIC_AGGREGATORS_1 c3132-node3.user.local_12001 ONLINE
##################################################
According to the above message, the cluster has two collectors.
... View more
Labels: