About SumitraMenon

SumitraMenon · ‎12-10-2019

To ensure that another NameNode in a cluster is always available when an active NameNode host fails, the NameNode high availability should be enabled and configured on the cluster from the Ambari Web User Interface. This video explains how to enable HA wizard and the steps that must be followed to set up NameNode high availability. Open the YouTube video here As a prerequisite, ensure the following: If the HDFS or ZooKeeper services are in Maintenance Mode the NameNode HA wizard will not complete successfully. HDFS and ZooKeeper must be stopped and started when enabling NameNode HA as the Maintenance Mode will prevent those start and stop operations from occurring. The Enable NameNode high availability section from the documentation contains the steps mentioned in this video. Recommended links: Product documentation page Community Forum

SumitraMenon · ‎12-10-2019

Apache Ranger is one of the easiest, robust and flexible framework to manage the authorization for different components of a cluster. However, if your policies are not syncing, that may become an issue. In this video, we will be troubleshooting the Ranger policy synchronization. We will look into the components that are involved, the components that interact between them and a few other most common issues. The video is to understand what are the main factors to check on a policy synchronization issue and how to solve them. Open the video on YouTube here

SumitraMenon · ‎12-10-2019

The Zookeeper transaction logs and Snapshots files are not human readable by default. Running a cat command on these files do not give clear information on the content of the files. The following video explains how to read Zookeeper transaction logs and Snapshots? Open the video on YouTube here To view the content of these files, use the following: To read the snapshots: java -cp /usr/hdp/current/zookeeper-server/zookeeper.jar:/usr/hdp/current/zookeeper-server/lib/* org.apache.zookeeper.server.SnapshotFormatter <Snapshot file name> To read the transaction logs: java -cp /usr/hdp/current/zookeeper-server/zookeeper.jar:/usr/hdp/current/zookeeper-server/lib/* org.apache.zookeeper.server.LogFormatter <Log file name> The classes that need to be used are located under /usr/hdp/current/zookeeper-server and /usr/hdp/current/zookeeper-server/lib .

SumitraMenon · ‎12-10-2019

This video article provides the steps on how to use Reassign Partitions Tool: Open YouTube video here Create a file named topics-to-move.json with the following content: { "topics": [{"topic":"<partitionName>"}], "version":1 } Run the following command: ./kafka-reassign-partitions.sh --zookeeper master:2181 --topics-to-move-json-file topics-to-move.json --broker-list "<brokerID>" --generate Take the output from previous Proposed partition reassignment configuration and create another json file reassign-partition.json. Run the following command: ./kafka-reassign-partitions.sh --zookeeper master:2181 --reassignment-json-file reassign-partition.json --execute

SumitraMenon · ‎12-10-2019

At times, Kafka Brokers can find one of its log directory utilization at 100% and the broker process would fail to start. This article provides the instructions to manually move partition data between different log directories within a Kafka Broker. Open the video on YouTube here The Kafka brokers maintain two offset checkpoint files inside each log directory: replication-offset-checkpoint recovery-point-offset-checkpoint And both these files have the following format: (a) 1st line: Version Number (b) 2nd line: Number of topic-partition entries in the file (c) All the remaining lines: Replication Offset/Recovery Point Offset, for every partition data maintained within the current log directory.

SumitraMenon · ‎12-10-2019

This video explains feasible and efficient ways to troubleshoot performance or perform root-cause analysis on any Spark streaming application, which usually tend to grow over the gigabyte size. However, this article does not cover yarn-client mode as it is recommended to use yarn-cluster for streaming applications due to reasons that will not be discussed on this article. Open the video on YouTube here Spark streaming applications usually run for long periods of time, before facing issues that may cause them to be shut down. In other cases, the application will not even be shut down, but it could be facing performance degradation during certain peak hours. In any case, the amount and size of this log will keep growing over time, making it really difficult to analyze when they start growing past the gigabyte size. It's well known that Spark, as many other applications, uses log4j facility to handle logs for both the driver and the executors, hence it is recommended to tune the log4j.properties file, to leverage the rolling file appender option, which will basically create a log file, rotate it when a size limit is met, and keep a number of backup logs as historical information that we can later on use for analysis. Updating the log4.properties file in the Spark configuration directory is not recommended, as it will have a cluster-wide effect, instead we can use it as a template to create our own log4j file that is going to be used for our streaming application without affecting other jobs. As an example, in this video, a log4j.properties file is created from scratch to meet the following conditions: Each log file will have a maximum size of 100Mb, a reasonable size that can be reviewed on most file editors while holding a reasonable time lapse of Spark events The latest 10 files are backed up for for historical analysis. The files will be saved in a custom path. The log4.properties file can be reused for multiple Spark streaming applications, and log files for each application will not overwrite each other. The vm properties will be used as a workaround. Both the Driver and the Executors, will have their own log4j properties file. This will provide flexibility on configuring log level for specific classes, file location, size, etc. Make the current and previous logs available on the Resource Manager UI. Procedure Create a new log4j-driver.properties file, for the Driver: log4j.rootLogger=INFO, rolling log4j.appender.rolling=org.apache.log4j.RollingFileAppender log4j.appender.rolling.layout=org.apache.log4j.PatternLayout log4j.appender.rolling.layout.conversionPattern=[%d] %p %m (%c)%n log4j.appender.rolling.maxFileSize=100MB log4j.appender.rolling.maxBackupIndex=10 log4j.appender.rolling.file=${spark.yarn.app.container.log.dir}/${vm.logging.name}-driver.log log4j.appender.rolling.encoding=UTF-8 log4j.logger.org.apache.spark=${vm.logging.level} log4j.logger.org.eclipse.jetty=WARN In the above content, the use of two JVM properties are leveraged: vm.logging.level which will allow to set a different log level for each application, without altering the content of the log4j properties file. vm.logging.name which will allow to have different driver log files per application by using a different application name for each spark streaming application. Similarly, create a new log4j-executor.properties file, for the Executors: log4j.rootLogger=INFO, rolling log4j.appender.rolling=org.apache.log4j.RollingFileAppender log4j.appender.rolling.layout=org.apache.log4j.PatternLayout log4j.appender.rolling.layout.conversionPattern=[%d] %p %m (%c)%n log4j.appender.rolling.maxFileSize=100MB log4j.appender.rolling.maxBackupIndex=10 log4j.appender.rolling.file=${spark.yarn.app.container.log.dir}/${vm.logging.name}-executor.log log4j.appender.rolling.encoding=UTF-8 log4j.logger.org.apache.spark=${vm.logging.level} log4j.logger.org.eclipse.jetty=WARN Next step, instruct Spark to use these custom log4j properties file: Applying the above template to a "real life" KafkaWordCount streaming application in a Kerberized environment, it would look like the following: spark-submit --master yarn --deploy-mode cluster --num-executors 3 \ --conf "spark.driver.extraJavaOptions=-Djava.security.auth.login.config=./key.conf \ -Dlog4j.configuration=log4j-driver.properties -Dvm.logging.level=DEBUG -Dvm.logging.name=SparkStreaming-1" \ --conf "spark.executor.extraJavaOptions=-Djava.security.auth.login.config=./key.conf \ -Dlog4j.configuration=log4j-executor.properties -Dvm.logging.level=DEBUG -Dvm.logging.name=SparkStreaming-1" \ --files key.conf,test.keytab,log4j-driver.properties,log4j-executor.properties \ --jars spark-streaming_2.11-2.3.0.2.6.5.0-292.jar, \ --packages org.apache.spark:spark-streaming-kafka-0-8_2.11:2.2.0.2.6.4.0-91, org.apache.spark:spark-streaming_2.11:2.2.0.2.6.4.0-91 \ --class org.apache.spark.examples.streaming.KafkaWordCount \ /usr/hdp/2.6.4.0-91/spark2/examples/jars/spark-examples_2.11-2.2.0.2.6.4.0-91.jar \ node2.fqdn,node3.fqdn,node4.fqdn \ my-consumer-group receiver 2 PLAINTEXTSASL (Template) Spark on YARN - Cluster mode, log level set to DEBUG and application name "SparkStreaming-1": spark-submit --master yarn --deploy-mode cluster \ --num-executors 3 \ --files log4j-driver.properties,log4j-executor.properties \ --conf "spark.driver.extraJavaOptions=-Dlog4j.configuration=log4j-driver.properties -Dvm.logging.level=DEBUG -Dvm.logging.name=SparkStreaming-1" \ --conf "spark.executor.extraJavaOptions=-Dlog4j.configuration=log4j-executor.properties -Dvm.logging.level=DEBUG -Dvm.logging.name=SparkStreaming-1" \ --class org.apache.spark.examples.SparkPi \ /usr/hdp/current/spark2-client/examples/jars/spark-examples_*.jar 1000 After running the Spark streaming application, the following information will be listed in NodeManager nodes where an executor is launched: This way it's easier to find and collect the necessary executor logs. Also, from the Resource Manager UI, the current log and any previous (backup) file will be listed:

SumitraMenon · ‎12-10-2019

Ambari Infra Solr can be present on HDP and HDF clusters. HDPSearch Solr is a separate product that can be installed on top of an HDP cluster. This video talks about the goals and differences between them and how to correctly select the product category when opening a support case. Open YouTube video here

SumitraMenon · ‎12-10-2019

This articles describe the steps required to complete the setup for accessing Grafana using HTTPS with CA Signed Certificates. Open YouTube video here Ambari Metrics System includes Grafana, which is a daemon that runs on a specific host in the cluster and serves pre-built dashboards for visualising metrics collected in the Metrics Collector. For this article the following servers is used: 172.25.33.152 c3132-node1.user.local (Ambari Server) 172.25.36.9 c3132-node2.user.local (Ambari Metrics Collector + Grafana) 172.25.40.27 c3132-node3.user.local (Ambari Metrics Collector) 172.25.33.163 c3132-node4.user.local By default, Grafana listen on port TCP/3000: # for i in $(netstat -utnlp | awk '/grafana/ {print substr($7, 1, length($7)-13)}' | sort -u) ; do echo ; ps -eo pid,user,command --cols 128 | grep $i | grep -v grep ; netstat -utnlp | grep $i ; echo ; done 270925 ams /usr/lib/ambari-metrics-grafana/bin/grafana-server --pidfile=/var/run/ambari-metrics-grafana/grafana-server.pid tcp6 0 0 :::3000 :::* LISTEN 270925/grafana-serv Here, the process running is grafana-server, the owner is ams, and is listening on port TCP/3000. All the configurations for Grafana are handled by Ambari, and are reflected in the ams-grafana.ini file located at /etc/ambari-metrics-grafana/conf/ directory. Grafana needs to be restarted for any configuration changes to take effect. In enterprises where security is required, limit the Grafana access to only HTTPS connections. To enable https for Grafana, update the following properties: AmbariUI / Services / Ambari Metrics / Configs -> Advanced ams-grafana-ini protocol: By default, http. For this video we need to change this to https. ca_cert: The path to CA root certificate or bundle to be used to validate the Grafana certificate against. Since we are using a PKCS#12 bundle certificate, we need to extract the CA certificate chain from it. cert_file: The path to the certificate. This certificate nees to be in PEM format. cert_key: The path for the private key that match with the public key of the certificate. This private key needs to be unencrypted RSA private key. For this article, the CA will provide us with a certificate bundle located at: /var/tmp/certificates/GRAFANA Since the certificate information provided by the CA is a PKCS#12 certificate bundle, complete the following steps: Extract the root and intermediate certificates, using the following command: openssl pkcs12 -in c3132-node2.user.local.p12 -out ams-ca.crt -cacerts -nokeys -passin pass:hadoop1234 Extract the server certificate: openssl pkcs12 -in c3132-node2.user.local.p12 -out ams-grafana.crt -clcerts -nokeys -passin pass:hadoop1234 Extract the private key: openssl pkcs12 -in c3132-node2.user.local.p12 -nocerts -nodes -out ams-grafana.key -passin pass:hadoop1234 Copy the certificates to a folder with ams user permissions. For this article, the default path and the default names the following: cp ams-*.* /etc/ambari-metrics-grafana/conf/ chown ams:hadoop /etc/ambari-metrics-grafana/conf/ams-*.* Update the Grafana configuration from Ambari: AmbariUI / Services / Ambari Metrics / Configs -> Advanced ams-grafana-ini protocol = https ca_cert = /etc/ambari-metrics-grafana/conf/ams-ca.crt cert_file = /etc/ambari-metrics-grafana/conf/ams-grafana.crt cert_key = /etc/ambari-metrics-grafana/conf/ams-grafana.key Save the changes, and restart all affected services. Double-check the Grafana log file: tail -f /var/log/ambari-metrics-grafana/grafana.log The following will be the listening on port 3000 over HTTPS: 2018/12/12 03:42:41 [I] Listen: https://0.0.0.0:3000 Double-check the certificate in place using: openssl s_client -connect c3132-node2.user.local:3000 </dev/null Open Grafana from Ambari to validate if its working as expected: AmbariUI / Services / Ambari Metrics / Summary -> Quick Link Grafana Note: Ignore the warning and proceed. With all these steps, Grafana is configured to be used with CA Signed certificates and the communication will be is over HTTPS.

SumitraMenon · ‎12-10-2019

This video contains a step by step process that shows how to connect to Hive running on a secure cluster while using a JDBC Uber driver from MS Windows. Open the video on YouTube here Prerequisites: Validate username belong to the same @DOMAIN /Realm as the one setup on cluster nodes. Install DbVisualizer. Download Hive Uber driver for same version as HDP. Kerberos Java Config Get Kerberos /etc/krb5.conf from cluster, scp this file from any cluster node to c:\windows. Rename krb5.conf to c:\windows\krb5.ini. Edit krb5.ini and add property: (see video) property udp_preference_limit = 1 DbVisualizer Setup Add Hive Uber driver .jar to DBVisualizer as driver for Hive (see video). Add startup Java command line options to DBVisualizer for kerberos , under Tools (see video): -Dsun.security.krb5.debug=true -Djavax.security.auth.useSubjectCredsOnly=false -Djava.security.krb5.conf=c:\windows\krb5.ini Setup new connection for HDP cluster: Database Server = ( hostname of node running HiveServer2) Database = change from default to default;principal=hive/_HOST@DOMAIN.CO Get Kerberos Session Ticket from DbVisualizer java JRE cd to folder where it is installed, find inside folder \jre\bin: kinit username OR if keytabfile used: kinit -kt keytab username@DOMAIN.COM Check for session on cache file with: klist Restart DbVisualizer and test connection to Hive

SumitraMenon · ‎12-10-2019

This video explains how to configure Ambari Metrics System AMS High Availability. Open YouTube video here To enable AMS high availability, the collector has to be configured to run in a distributed mode. When the Collector is configured for distributed mode, it writes metrics to HDFS, and the components run in distributed processes, which helps to manage CPU and memory. The following steps assume a cluster configured for a highly available NameNode. Set the HBase Root directory value to use the HDFS name service instead of the NameNode hostname. Migrate existing data from the local store to HDFS prior to switching to a distributed mode. To switch the Metrics Collector from embedded mode to distributed mode, update the Metrics Service operation mode and the location where the metrics are being stored. In summary, the following steps are required: Stop Ambari Metrics System Prepare the Environment to migrate from Local File System to HDFS Migrate Collector Data to HDFS Configure Distributed Mode using Ambari Restart all affected and Monitoring Collector Log Stop all the services associated with the AMS component using Ambari AmbariUI / Services / Ambari Metrics / Summary / Action / Stop Prepare the Environment to migrate from Local File System to HDFS AMS_User=ams AMS_Group=hadoop AMS_Embedded_RootDir=$(grep -C 2 hbase.rootdir /etc/ambari-metrics-collector/conf/hbase-site.xml | awk -F"[<|>|:]" '/value/ {print $4}' | sed 's|//||1') ActiveNN=$(su -l hdfs -c "hdfs haadmin -getAllServiceState | awk -F '[:| ]' '/active/ {print \$1}'") NN_Port=$(su -l hdfs -c "hdfs haadmin -getAllServiceState | awk -F '[:| ]' '/active/ {print \$2}'") HDFS_Name_Service=$(grep -A 1 dfs.nameservice /etc/hadoop/conf/hdfs-site.xml | awk -F"[<|>]" '/value/ {print $3}') HDFS_AMS_PATH=/apps/ams/metrics Create the folder for Collector data in HDFS su -l hdfs -c "hdfs dfs -mkdir -p ${HDFS_AMS_PATH}" su -l hdfs -c "hdfs dfs -chown ${AMS_User}:${AMS_Group} ${HDFS_AMS_PATH}" Update permissions to be able to copy collector data from local file system to HDFS namei -l ${AMS_Embedded_RootDir}/staging chmod +rx ${AMS_Embedded_RootDir}/staging Copy collector data from local file system to HDFS su -l hdfs -c "hdfs dfs -copyFromLocal ${AMS_Embedded_RootDir} hdfs://${ActiveNN} :${NN_Port}${HDFS_AMS_PATH}" su - hdfs -c "hdfs dfs -chown -R ${AMS_User}:${AMS_Group} ${HDFS_AMS_PATH}" Configure collector to distrubute mode using Ambari: AmbariUI / Services / Ambari Metrics / Configs / Metrics Service operation mode = distributed AmbariUI / Services / Ambari Metrics / Configs / Advanced ams-hbase-site / hbase.cluster.distributed = true AmbariUI / Services / Ambari Metrics / Configs / Advanced ams-hbase-site / HBase root directory = hdfs://AMSHA/apps/ams/metrics AmbariUI / Services / HDFS / Configs / Custom core-site hadoop.proxyuser.hdfs.groups = * hadoop.proxyuser.root.groups = * hadoop.proxyuser.hdfs.hosts = * hadoop.proxyuser.root.hosts = * AmbariUI / Services / HDFS / Configs / HDFS Short-circuit read /Advanced hdfs-site = true (check) AmbariUI -> Restart All required Note: Impersonation is the ability to allow a service user to securely access data in Hadoop on behalf of another user. When proxy users is configured, any access using a proxy are executed with the impersonated user's existing privilege levels rather than those of a superuser, like HDFS. The behavior is similar when using proxy hosts. Basically, it limits the hosts from which impersonated connections are allowed. For this article and testing purposes, all users and all hosts are allowed. Additionally, one of the key principles behind Apache Hadoop is the idea that moving computation is cheaper than moving data. With Short-Circuit Local Reads, since the client and the data are on the same node, there is no need for the DataNode to be in the data path. Rather, the client itself can simply read the data from the local disk improving performance Once the AMS is up and running, in the Metrics Collector Log the following message is displayed: 2018-12-12 01:21:12,132 INFO org.eclipse.jetty.server.Server: Started @14700ms 2018-12-12 01:21:12,132 INFO org.apache.hadoop.yarn.webapp.WebApps: Web app timeline started at 6188 2018-12-12 01:21:40,633 INFO org.apache.ambari.metrics.core.timeline.availability.MetricCollectorHAController: ######################### Cluster HA state ######################## CLUSTER: ambari-metrics-cluster RESOURCE: METRIC_AGGREGATORS PARTITION: METRIC_AGGREGATORS_0 c3132-node2.user.local_12001 ONLINE PARTITION: METRIC_AGGREGATORS_1 c3132-node2.user.local_12001 ONLINE ################################################## According to above message, there a cluster with only one collector. The next logical step, will be adding an additional Collector from Ambari Server. To do this, run the following: AmbariUI / Hosts / c3132-node3.user.local / Summary -> +ADD -> Metrics Collector Note: c3132-node3.user.local is the node where you will be adding the Collector. Since distributed mode is already enabled, after adding the collector, start the service. Once the AMS is up and running, the following message is displayed in the Metrics Collector Log: 2018-12-12 01:34:56,060 INFO org.apache.ambari.metrics.core.timeline.availability.MetricCollectorHAController: ######################### Cluster HA state ######################## CLUSTER: ambari-metrics-cluster RESOURCE: METRIC_AGGREGATORS PARTITION: METRIC_AGGREGATORS_0 c3132-node2.user.local_12001 ONLINE PARTITION: METRIC_AGGREGATORS_1 c3132-node3.user.local_12001 ONLINE ################################################## According to the above message, the cluster has two collectors.

Online	Offline
Last Visited	‎04-10-2024 03:58 AM

Member Since	‎02-07-2019 08:28 PM
Last Visited	‎04-10-2024 03:58 AM
Posts	1,792
Kudos received	1

Cloudera Community

Support Video: How to enable HDFS Namenode High Av...

Support Video: How to troubleshoot Ranger Policy S...

Support Video: How to read Zookeeper transaction l...

Support Video: How to use reassign partitions tool...

Support Video: How to move Kafka Partition log dir...

Support Video: How to configure log4j for Spark on...

Support Video: What is the difference between Amba...

Support Video: How to Configure Grafana to use CA ...

Support Video: How to connect DbVisualizer to Hive...

Support Video: How to configure Ambari Metrics Sys...