Member since
02-07-2019
1792
Posts
1
Kudos Received
0
Solutions
12-10-2019
07:56 AM
To ensure that another NameNode in a cluster is always available when an active NameNode host fails, the NameNode high availability should be enabled and configured on the cluster from the Ambari Web User Interface.
This video explains how to enable HA wizard and the steps that must be followed to set up NameNode high availability.
Open the YouTube video here
As a prerequisite, ensure the following:
If the HDFS or ZooKeeper services are in Maintenance Mode the NameNode HA wizard will not complete successfully.
HDFS and ZooKeeper must be stopped and started when enabling NameNode HA as the Maintenance Mode will prevent those start and stop operations from occurring.
The Enable NameNode high availability section from the documentation contains the steps mentioned in this video. Recommended links:
Product documentation page
Community Forum
... View more
Labels:
12-10-2019
07:55 AM
Apache Ranger is one of the easiest, robust and flexible framework to manage the authorization for different components of a cluster. However, if your policies are not syncing, that may become an issue.
In this video, we will be troubleshooting the Ranger policy synchronization. We will look into the components that are involved, the components that interact between them and a few other most common issues.
The video is to understand what are the main factors to check on a policy synchronization issue and how to solve them.
Open the video on YouTube here
... View more
Labels:
12-10-2019
07:53 AM
The Zookeeper transaction logs and Snapshots files are not human readable by default. Running a cat command on these files do not give clear information on the content of the files.
The following video explains how to read Zookeeper transaction logs and Snapshots?
Open the video on YouTube here
To view the content of these files, use the following: To read the snapshots:
java -cp /usr/hdp/current/zookeeper-server/zookeeper.jar:/usr/hdp/current/zookeeper-server/lib/*
org.apache.zookeeper.server.SnapshotFormatter <Snapshot file name>
To read the transaction logs:
java -cp /usr/hdp/current/zookeeper-server/zookeeper.jar:/usr/hdp/current/zookeeper-server/lib/*
org.apache.zookeeper.server.LogFormatter <Log file name>
The classes that need to be used are located under /usr/hdp/current/zookeeper-server and /usr/hdp/current/zookeeper-server/lib .
... View more
Labels:
12-10-2019
03:59 AM
This video article provides the steps on how to use Reassign Partitions Tool:
Open YouTube video here
Create a file named topics-to-move.json with the following content:
{
"topics": [{"topic":"<partitionName>"}],
"version":1
}
Run the following command:
./kafka-reassign-partitions.sh --zookeeper master:2181
--topics-to-move-json-file topics-to-move.json --broker-list "<brokerID>"
--generate
Take the output from previous Proposed partition reassignment configuration and create another json file reassign-partition.json.
Run the following command:
./kafka-reassign-partitions.sh --zookeeper master:2181
--reassignment-json-file reassign-partition.json --execute
... View more
Labels:
12-10-2019
03:55 AM
1 Kudo
At times, Kafka Brokers can find one of its log directory utilization at 100% and the broker process would fail to start.
This article provides the instructions to manually move partition data between different log directories within a Kafka Broker.
Open the video on YouTube here
The Kafka brokers maintain two offset checkpoint files inside each log directory:
replication-offset-checkpoint
recovery-point-offset-checkpoint
And both these files have the following format:
(a) 1st line: Version Number
(b) 2nd line: Number of topic-partition entries in the file
(c) All the remaining lines: Replication Offset/Recovery Point Offset, for every partition data maintained within the current log directory.
... View more
Labels:
12-10-2019
03:53 AM
1 Kudo
This video explains feasible and efficient ways to troubleshoot performance or perform root-cause analysis on any Spark streaming application, which usually tend to grow over the gigabyte size. However, this article does not cover yarn-client mode as it is recommended to use yarn-cluster for streaming applications due to reasons that will not be discussed on this article.
Open the video on YouTube here
Spark streaming applications usually run for long periods of time, before facing issues that may cause them to be shut down. In other cases, the application will not even be shut down, but it could be facing performance degradation during certain peak hours. In any case, the amount and size of this log will keep growing over time, making it really difficult to analyze when they start growing past the gigabyte size. It's well known that Spark, as many other applications, uses log4j facility to handle logs for both the driver and the executors, hence it is recommended to tune the log4j.properties file, to leverage the rolling file appender option, which will basically create a log file, rotate it when a size limit is met, and keep a number of backup logs as historical information that we can later on use for analysis. Updating the log4.properties file in the Spark configuration directory is not recommended, as it will have a cluster-wide effect, instead we can use it as a template to create our own log4j file that is going to be used for our streaming application without affecting other jobs. As an example, in this video, a log4j.properties file is created from scratch to meet the following conditions:
Each log file will have a maximum size of 100Mb, a reasonable size that can be reviewed on most file editors while holding a reasonable time lapse of Spark events
The latest 10 files are backed up for for historical analysis.
The files will be saved in a custom path.
The log4.properties file can be reused for multiple Spark streaming applications, and log files for each application will not overwrite each other. The vm properties will be used as a workaround.
Both the Driver and the Executors, will have their own log4j properties file. This will provide flexibility on configuring log level for specific classes, file location, size, etc.
Make the current and previous logs available on the Resource Manager UI.
Procedure
Create a new log4j-driver.properties file, for the Driver:
log4j.rootLogger=INFO, rolling
log4j.appender.rolling=org.apache.log4j.RollingFileAppender
log4j.appender.rolling.layout=org.apache.log4j.PatternLayout
log4j.appender.rolling.layout.conversionPattern=[%d] %p %m (%c)%n
log4j.appender.rolling.maxFileSize=100MB
log4j.appender.rolling.maxBackupIndex=10
log4j.appender.rolling.file=${spark.yarn.app.container.log.dir}/${vm.logging.name}-driver.log
log4j.appender.rolling.encoding=UTF-8
log4j.logger.org.apache.spark=${vm.logging.level}
log4j.logger.org.eclipse.jetty=WARN
In the above content, the use of two JVM properties are leveraged:
vm.logging.level which will allow to set a different log level for each application, without altering the content of the log4j properties file.
vm.logging.name which will allow to have different driver log files per application by using a different application name for each spark streaming application.
Similarly, create a new log4j-executor.properties file, for the Executors:
log4j.rootLogger=INFO, rolling
log4j.appender.rolling=org.apache.log4j.RollingFileAppender
log4j.appender.rolling.layout=org.apache.log4j.PatternLayout
log4j.appender.rolling.layout.conversionPattern=[%d] %p %m (%c)%n
log4j.appender.rolling.maxFileSize=100MB
log4j.appender.rolling.maxBackupIndex=10
log4j.appender.rolling.file=${spark.yarn.app.container.log.dir}/${vm.logging.name}-executor.log
log4j.appender.rolling.encoding=UTF-8
log4j.logger.org.apache.spark=${vm.logging.level}
log4j.logger.org.eclipse.jetty=WARN
Next step, instruct Spark to use these custom log4j properties file:
Applying the above template to a "real life" KafkaWordCount streaming application in a Kerberized environment, it would look like the following:
spark-submit --master yarn --deploy-mode cluster --num-executors 3 \
--conf "spark.driver.extraJavaOptions=-Djava.security.auth.login.config=./key.conf \
-Dlog4j.configuration=log4j-driver.properties -Dvm.logging.level=DEBUG -Dvm.logging.name=SparkStreaming-1" \
--conf "spark.executor.extraJavaOptions=-Djava.security.auth.login.config=./key.conf \
-Dlog4j.configuration=log4j-executor.properties -Dvm.logging.level=DEBUG -Dvm.logging.name=SparkStreaming-1" \
--files key.conf,test.keytab,log4j-driver.properties,log4j-executor.properties \
--jars spark-streaming_2.11-2.3.0.2.6.5.0-292.jar, \
--packages org.apache.spark:spark-streaming-kafka-0-8_2.11:2.2.0.2.6.4.0-91,
org.apache.spark:spark-streaming_2.11:2.2.0.2.6.4.0-91 \
--class org.apache.spark.examples.streaming.KafkaWordCount \
/usr/hdp/2.6.4.0-91/spark2/examples/jars/spark-examples_2.11-2.2.0.2.6.4.0-91.jar \
node2.fqdn,node3.fqdn,node4.fqdn \
my-consumer-group receiver 2 PLAINTEXTSASL
(Template) Spark on YARN - Cluster mode, log level set to DEBUG and application name "SparkStreaming-1":
spark-submit --master yarn --deploy-mode cluster \
--num-executors 3 \
--files log4j-driver.properties,log4j-executor.properties \
--conf "spark.driver.extraJavaOptions=-Dlog4j.configuration=log4j-driver.properties
-Dvm.logging.level=DEBUG -Dvm.logging.name=SparkStreaming-1" \
--conf "spark.executor.extraJavaOptions=-Dlog4j.configuration=log4j-executor.properties
-Dvm.logging.level=DEBUG -Dvm.logging.name=SparkStreaming-1" \
--class org.apache.spark.examples.SparkPi \
/usr/hdp/current/spark2-client/examples/jars/spark-examples_*.jar 1000
After running the Spark streaming application, the following information will be listed in NodeManager nodes where an executor is launched: This way it's easier to find and collect the necessary executor logs. Also, from the Resource Manager UI, the current log and any previous (backup) file will be listed:
... View more
Labels:
12-10-2019
03:50 AM
Ambari Infra Solr can be present on HDP and HDF clusters. HDPSearch Solr is a separate product that can be installed on top of an HDP cluster.
This video talks about the goals and differences between them and how to correctly select the product category when opening a support case.
Open YouTube video here
... View more
Labels:
12-10-2019
03:49 AM
This articles describe the steps required to complete the setup for accessing Grafana using HTTPS with CA Signed Certificates.
Open YouTube video here
Ambari Metrics System includes Grafana, which is a daemon that runs on a specific host in the cluster and serves pre-built dashboards for visualising metrics collected in the Metrics Collector. For this article the following servers is used:
172.25.33.152 c3132-node1.user.local (Ambari Server)
172.25.36.9 c3132-node2.user.local (Ambari Metrics Collector + Grafana)
172.25.40.27 c3132-node3.user.local (Ambari Metrics Collector)
172.25.33.163 c3132-node4.user.local
By default, Grafana listen on port TCP/3000:
# for i in $(netstat -utnlp | awk '/grafana/ {print substr($7, 1, length($7)-13)}' |
sort -u) ; do echo ; ps -eo pid,user,command --cols 128 | grep $i | grep -v grep ;
netstat -utnlp | grep $i ; echo ; done
270925 ams /usr/lib/ambari-metrics-grafana/bin/grafana-server
--pidfile=/var/run/ambari-metrics-grafana/grafana-server.pid
tcp6 0 0 :::3000 :::*
LISTEN 270925/grafana-serv
Here, the process running is grafana-server, the owner is ams, and is listening on port TCP/3000. All the configurations for Grafana are handled by Ambari, and are reflected in the ams-grafana.ini file located at /etc/ambari-metrics-grafana/conf/ directory. Grafana needs to be restarted for any configuration changes to take effect. In enterprises where security is required, limit the Grafana access to only HTTPS connections. To enable https for Grafana, update the following properties:
AmbariUI / Services / Ambari Metrics / Configs -> Advanced ams-grafana-ini
protocol: By default, http. For this video we need to change this to https.
ca_cert: The path to CA root certificate or bundle to be used to validate the Grafana certificate against. Since we are using a PKCS#12 bundle certificate, we need to extract the CA certificate chain from it.
cert_file: The path to the certificate. This certificate nees to be in PEM format.
cert_key: The path for the private key that match with the public key of the certificate. This private key needs to be unencrypted RSA private key.
For this article, the CA will provide us with a certificate bundle located at:
/var/tmp/certificates/GRAFANA
Since the certificate information provided by the CA is a PKCS#12 certificate bundle, complete the following steps:
Extract the root and intermediate certificates, using the following command:
openssl pkcs12 -in c3132-node2.user.local.p12 -out ams-ca.crt -cacerts -nokeys
-passin pass:hadoop1234
Extract the server certificate:
openssl pkcs12 -in c3132-node2.user.local.p12 -out ams-grafana.crt -clcerts
-nokeys -passin pass:hadoop1234
Extract the private key:
openssl pkcs12 -in c3132-node2.user.local.p12 -nocerts -nodes -out ams-grafana.key
-passin pass:hadoop1234
Copy the certificates to a folder with ams user permissions. For this article, the default path and the default names the following:
cp ams-*.* /etc/ambari-metrics-grafana/conf/
chown ams:hadoop /etc/ambari-metrics-grafana/conf/ams-*.*
Update the Grafana configuration from Ambari:
AmbariUI / Services / Ambari Metrics / Configs -> Advanced ams-grafana-ini
protocol = https
ca_cert = /etc/ambari-metrics-grafana/conf/ams-ca.crt
cert_file = /etc/ambari-metrics-grafana/conf/ams-grafana.crt
cert_key = /etc/ambari-metrics-grafana/conf/ams-grafana.key
Save the changes, and restart all affected services.
Double-check the Grafana log file:
tail -f /var/log/ambari-metrics-grafana/grafana.log
The following will be the listening on port 3000 over HTTPS:
2018/12/12 03:42:41 [I] Listen: https://0.0.0.0:3000
Double-check the certificate in place using:
openssl s_client -connect c3132-node2.user.local:3000 </dev/null
Open Grafana from Ambari to validate if its working as expected:
AmbariUI / Services / Ambari Metrics / Summary -> Quick Link Grafana
Note: Ignore the warning and proceed.
With all these steps, Grafana is configured to be used with CA Signed certificates and the communication will be is over HTTPS.
... View more
Labels:
12-10-2019
03:45 AM
This video contains a step by step process that shows how to connect to Hive running on a secure cluster while using a JDBC Uber driver from MS Windows.
Open the video on YouTube here
Prerequisites:
Validate username belong to the same @DOMAIN /Realm as the one setup on cluster nodes.
Install DbVisualizer.
Download Hive Uber driver for same version as HDP.
Kerberos Java Config
Get Kerberos /etc/krb5.conf from cluster, scp this file from any cluster node to c:\windows.
Rename krb5.conf to c:\windows\krb5.ini.
Edit krb5.ini and add property: (see video)
property udp_preference_limit = 1
DbVisualizer Setup
Add Hive Uber driver .jar to DBVisualizer as driver for Hive (see video).
Add startup Java command line options to DBVisualizer for kerberos , under Tools (see video):
-Dsun.security.krb5.debug=true
-Djavax.security.auth.useSubjectCredsOnly=false
-Djava.security.krb5.conf=c:\windows\krb5.ini
Setup new connection for HDP cluster:
Database Server = ( hostname of node running HiveServer2)
Database = change from default to default;principal=hive/_HOST@DOMAIN.CO
Get Kerberos Session Ticket from DbVisualizer java JRE
cd to folder where it is installed, find inside folder \jre\bin:
kinit username
OR if keytabfile used:
kinit -kt keytab username@DOMAIN.COM
Check for session on cache file with:
klist
Restart DbVisualizer and test connection to Hive
... View more
Labels:
12-10-2019
03:42 AM
This video explains how to configure Ambari Metrics System AMS High Availability.
Open YouTube video here
To enable AMS high availability, the collector has to be configured to run in a distributed mode. When the Collector is configured for distributed mode, it writes metrics to HDFS, and the components run in distributed processes, which helps to manage CPU and memory.
The following steps assume a cluster configured for a highly available NameNode.
Set the HBase Root directory value to use the HDFS name service instead of the NameNode hostname.
Migrate existing data from the local store to HDFS prior to switching to a distributed mode.
To switch the Metrics Collector from embedded mode to distributed mode, update the Metrics Service operation mode and the location where the metrics are being stored. In summary, the following steps are required:
Stop Ambari Metrics System
Prepare the Environment to migrate from Local File System to HDFS
Migrate Collector Data to HDFS
Configure Distributed Mode using Ambari
Restart all affected and Monitoring Collector Log
Stop all the services associated with the AMS component using Ambari
AmbariUI / Services / Ambari Metrics / Summary / Action / Stop
Prepare the Environment to migrate from Local File System to HDFS
AMS_User=ams
AMS_Group=hadoop
AMS_Embedded_RootDir=$(grep -C 2 hbase.rootdir /etc/ambari-metrics-collector/conf/hbase-site.xml | awk -F"[<|>|:]" '/value/ {print $4}' | sed 's|//||1')
ActiveNN=$(su -l hdfs -c "hdfs haadmin -getAllServiceState | awk -F '[:| ]' '/active/ {print \$1}'")
NN_Port=$(su -l hdfs -c "hdfs haadmin -getAllServiceState | awk -F '[:| ]' '/active/ {print \$2}'")
HDFS_Name_Service=$(grep -A 1 dfs.nameservice /etc/hadoop/conf/hdfs-site.xml | awk -F"[<|>]" '/value/ {print $3}')
HDFS_AMS_PATH=/apps/ams/metrics
Create the folder for Collector data in HDFS
su -l hdfs -c "hdfs dfs -mkdir -p ${HDFS_AMS_PATH}"
su -l hdfs -c "hdfs dfs -chown ${AMS_User}:${AMS_Group} ${HDFS_AMS_PATH}"
Update permissions to be able to copy collector data from local file system to HDFS
namei -l ${AMS_Embedded_RootDir}/staging
chmod +rx ${AMS_Embedded_RootDir}/staging
Copy collector data from local file system to HDFS
su -l hdfs -c "hdfs dfs -copyFromLocal ${AMS_Embedded_RootDir} hdfs://${ActiveNN}
:${NN_Port}${HDFS_AMS_PATH}"
su - hdfs -c "hdfs dfs -chown -R ${AMS_User}:${AMS_Group} ${HDFS_AMS_PATH}"
Configure collector to distrubute mode using Ambari:
AmbariUI / Services / Ambari Metrics / Configs / Metrics Service operation mode =
distributed
AmbariUI / Services / Ambari Metrics / Configs / Advanced ams-hbase-site /
hbase.cluster.distributed = true
AmbariUI / Services / Ambari Metrics / Configs / Advanced ams-hbase-site /
HBase root directory = hdfs://AMSHA/apps/ams/metrics
AmbariUI / Services / HDFS / Configs / Custom core-site
hadoop.proxyuser.hdfs.groups = *
hadoop.proxyuser.root.groups = *
hadoop.proxyuser.hdfs.hosts = *
hadoop.proxyuser.root.hosts = *
AmbariUI / Services / HDFS / Configs / HDFS Short-circuit read /Advanced
hdfs-site = true (check)
AmbariUI -> Restart All required
Note: Impersonation is the ability to allow a service user to securely access data in Hadoop on behalf of another user. When proxy users is configured, any access using a proxy are executed with the impersonated user's existing privilege levels rather than those of a superuser, like HDFS. The behavior is similar when using proxy hosts. Basically, it limits the hosts from which impersonated connections are allowed. For this article and testing purposes, all users and all hosts are allowed.
Additionally, one of the key principles behind Apache Hadoop is the idea that moving computation is cheaper than moving data. With Short-Circuit Local Reads, since the client and the data are on the same node, there is no need for the DataNode to be in the data path. Rather, the client itself can simply read the data from the local disk improving performance
Once the AMS is up and running, in the Metrics Collector Log the following message is displayed:
2018-12-12 01:21:12,132 INFO org.eclipse.jetty.server.Server: Started @14700ms
2018-12-12 01:21:12,132 INFO org.apache.hadoop.yarn.webapp.WebApps: Web app timeline started at 6188
2018-12-12 01:21:40,633 INFO org.apache.ambari.metrics.core.timeline.availability.MetricCollectorHAController:
######################### Cluster HA state ########################
CLUSTER: ambari-metrics-cluster
RESOURCE: METRIC_AGGREGATORS
PARTITION: METRIC_AGGREGATORS_0 c3132-node2.user.local_12001 ONLINE
PARTITION: METRIC_AGGREGATORS_1 c3132-node2.user.local_12001 ONLINE
##################################################
According to above message, there a cluster with only one collector. The next logical step, will be adding an additional Collector from Ambari Server. To do this, run the following:
AmbariUI / Hosts / c3132-node3.user.local / Summary -> +ADD -> Metrics Collector
Note: c3132-node3.user.local is the node where you will be adding the Collector. Since distributed mode is already enabled, after adding the collector, start the service. Once the AMS is up and running, the following message is displayed in the Metrics Collector Log:
2018-12-12 01:34:56,060 INFO org.apache.ambari.metrics.core.timeline.availability.MetricCollectorHAController:
######################### Cluster HA state ########################
CLUSTER: ambari-metrics-cluster
RESOURCE: METRIC_AGGREGATORS
PARTITION: METRIC_AGGREGATORS_0 c3132-node2.user.local_12001 ONLINE
PARTITION: METRIC_AGGREGATORS_1 c3132-node3.user.local_12001 ONLINE
##################################################
According to the above message, the cluster has two collectors.
... View more
Labels: