About vbonthu

vbonthu · ‎06-27-2018

Current version of Ambari doesn't monitor Number of Hiveserver2 connections. We often see Hiveserver2 slowness due to heavy load as part increase in number connections to Hiveserver2. Setting up a alert for Hiveserver2 established connections will help us to take required actions like adding additional Hiveserver2 services, proper load balancing or scheduling the jobs. NOTE : Please go through this article https://github.com/apache/ambari/blob/2.6.2-maint/ambari-server/docs/api/v1/alert-definitions.md to understand Ambari Alert Definition Please find the python script and .json file used below in the attachments. alert_hiveserver_num_connection.py - Is the python script that finds the current established connection for each Hiveserver2 and based on number of connection it returns 'CRITICAL', 'WARN', 'OK’ alerths.json - Is the Ambari Alert definition Below are the steps to setup the Ambari Alert on Hiveserver2 Established connections. Step 1 - Place the file “alert_hiveserver_num_connection.py" in the following path on the ambari-server : "/var/lib/ambari-server/resources/common-services/HIVE/0.12.0.2.0/package/alerts/ " [root@vb-atlas-ambari tmp]# cp alert_hiveserver_num_connection.py /var/lib/ambari-server/resources/common-services/HIVE/0.12.0.2.0/package/alerts/ Step 2 - Restart Ambari Server, to force Ambari agents to pull alert_hiveserver_num_connection.py python script to every host. ambari-server restart Once Ambari Server is restarted , we can verify if alert_hiveserver_num_connection.py is available in " /var/lib/ambari-agent/cache/common-services/HIVE/0.12.0.2.0/package/alerts/ " location on Hiveserver2 host. Note : Some time it takes longer for Ambari agent to pull the script from Ambari server. [root@vb-atlas-node1 ~]# ll /var/lib/ambari-agent/cache/common-services/HIVE/0.12.0.2.0/package/alerts/ total 116 -rw-r--r--. 1 root root 9740 Jun 27 17:01 alert_hive_interactive_thrift_port.py -rw-r--r--. 1 root root 7893 Jun 27 17:01 alert_hive_interactive_thrift_port.pyo -rw-r--r--. 1 root root 9988 Jun 27 17:01 alert_hive_metastore.py -rw-r--r--. 1 root root 9069 Jun 27 17:01 alert_hive_metastore.pyo -rw-r--r--. 1 root root 1888 Jun 27 17:01 alert_hiveserver_num_connection.py -rw-r--r--. 1 root root 11459 Jun 27 17:01 alert_hive_thrift_port.py -rw-r--r--. 1 root root 9362 Jun 27 17:01 alert_hive_thrift_port.pyo -rw-r--r--. 1 root root 11946 Jun 27 17:01 alert_llap_app_status.py -rw-r--r--. 1 root root 9339 Jun 27 17:01 alert_llap_app_status.pyo -rw-r--r--. 1 root root 8886 Jun 27 17:01 alert_webhcat_server.py -rw-r--r--. 1 root root 6563 Jun 27 17:01 alert_webhcat_server.pyo Step 3 - Post the Alert Definition (alerths.json) to Ambari using curl curl -u <Ambari_admin_username>:<Amabri_admin_password> -i -h 'X-Requested-By:ambari' -X POST -d @alerths.json http://<AMBARI_HOST>:<AMBARI_PORT>/api/v1/clusters/<CLUSTER_NAME>/alert_definitions Example : [root@vb-atlas-ambari tmp]# curl -u admin:admin -i -H 'X-Requested-By:ambari' -X POST -d @alerths.json http://172.26.108.142:8080/api/v1/clusters/vinod/alert_definitions HTTP/1.1 201 Created X-Frame-Options: DENY X-XSS-Protection: 1; mode=block X-Content-Type-Options: nosniff Cache-Control: no-store Pragma: no-cache Set-Cookie: AMBARISESSIONID=10f33laf224yy1834ygq9cekbo;Path=/;HttpOnly Expires: Thu, 01 Jan 1970 00:00:00 GMT User: admin Content-Type: text/plain Content-Length: 0 We should be able to see the Alert in Ambari -> Alerts ( HiveServer2 Established Connections) Alternative we can also see the "HiveServer2 Established Connections" listed in Alert definitions “ http://<AMBARI_HOST>:<AMBARI_PORT>/api/v1/clusters/<CLUSTER_NAME>/alert_definitions “ Step 4 - As per Alert Definition (alerths.json) CRITICAL alert is set to 50 and WARNING is set to 30 connections by default. You can update the values directly from Ambari by editing the values.

vbonthu · ‎06-14-2018

In DPS-1.1.0 We can't remove cluster from DPS UI. We can use the curl to remove the cluster. Note : User with Dataplane Admin Role can perform the below. screen-shot-2018-06-14-at-102253-am.png To delete smayani-hdp cluster Step1:- Find the cluster ID, which you want to remove. You can use the developers tools from the browser to find the cluster ID. screen-shot-2018-06-14-at-101629-am.png From above example smayani-hdp cluster ID is: 3 ( https://172.26.125.109/api/lakes/3/servicesDetails ) Step 2 :- From the console use the below curl to remove the cluster. curl -k -u <username>:<Password> -X DELETE https://<DPS_HOST>/api/lakes/<cluster_ID>; Example : curl -k -u admin:kjncsadasdcsdc -X DELETE https://172.26.125.109/api/lakes/3 Once the above is executed you should no longer see the cluster in UI. screen-shot-2018-06-14-at-102509-am.png Alternative You can also use rm_dp_cluster.sh in /usr/dp/current/core/bin on DPS installed server. Usage: ./rm_dp_cluster.sh DP_JWT HADOOP_JWT DP_HOST_NAME CLUSTER_NAME DATA_CENTER_NAME DP_JWT: Value of dp_jwt cookie from a valid user's browser session HADOOP_JWT: Value of hadoop-jwt cookie from a valid user's browser session DP_HOST_NAME: Hostname or IP address of the DataPlane server CLUSTER_NAME: Name of the cluster to delete DATA_CENTER_NAME: Name of the datacenter cluster belongs to You can use developers tool to use find the cookies ( DP_JWT, HADOOP_JWT)

vbonthu · ‎06-06-2018

OBJECTIVE: Updating the log configs of DPS App. Example default log file is set to logs/application.log which can be changed or Updating the log level to DEBUG for troubleshooting. Since DP App will be running in docker we can use docker commands to update them. STEPS: 1. Find the docker container running DP App on the host running DPS. Use "docker ps" [root@dps-node ~]#docker ps CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES abd412417907 hortonworks/dlm-app:1.1.0.0-41 "runsvdir /etc/sv" 28 hours ago Up 2 hours 9011/tcp dlm-app 62620e578e31 hortonworks/dp-app:1.1.0.0-390 "/bootstrap.sh" 2 days ago Up 16 minutes 0.0.0.0:80->80/tcp, 0.0.0.0:443->443/tcp, 9000/tcp dp-app 38dda17dfdf4 hortonworks/dp-cluster-service:1.1.0.0-390 "./docker_service_st…" 2 days ago Up 2 days 9009-9010/tcp Copy the container ID, from above example it is "62620e578e31" 2. Get the current logback.xml file [root@dps-node ~]# docker exec -it 62620e578e31 /bin/cat /usr/dp-app/conf/logback.xml > logback.xml 3. Update the configs in local logback.xml which we redirected in above command. In below I have updated the location from default logs/application.logs to /usr/dp-app/logs/. <configuration> <appender name="FILE" class="ch.qos.logback.core.rolling.RollingFileAppender"> <file>/usr/dp-app/logs/application.log</file> <rollingPolicy class="ch.qos.logback.core.rolling.TimeBasedRollingPolicy">  . . <appender name="AKKA" class="ch.qos.logback.core.rolling.RollingFileAppender"> <file>/usr/dp-app/logs/akka.log</file> <rollingPolicy class="ch.qos.logback.core.rolling.TimeBasedRollingPolicy"> . . . </encoder> </appender> <appender name="ACCESS_FILE" class="ch.qos.logback.core.rolling.RollingFileAppender"> <file>/usr/dp-app/logs/access.log</file> <rollingPolicy class="ch.qos.logback.core.rolling.TimeBasedRollingPolicy"> . . We can also update the log level . <root level="DEBUG"> <appender-ref ref="FILE"/> </root> 4. If needed make a backup of the original logback.xml file and cp the updated logback.xml [root@dps-node ~]#docker exec -it 62620e578e31 /bin/cp /usr/dp-app/conf/logback.xml /usr/dp-app/conf/logback.xml.bck [root@dps-node ~]# docker exec -i 62620e578e31 tee /usr/dp-app/conf/logback.xml < logback.xml 5. Restart the docker container is required to make changes effective. [root@dps-node ~]# docker restart 62620e578e31 6. You can verify if the changes have updated. [root@dps-node ~]# docker exec -it 62620e578e31 /bin/ls -lrt /usr/dp-app/logs total 64 -rw-r--r-- 1 root root 0 Jun 6 20:50 access.log -rw-r--r-- 1 root root 62790 Jun 6 21:27 application.log

vbonthu · ‎06-04-2018

Short Description: Describes ways to manually regenerate keytabs for services through Ambari REST API Article Make sure KDC credentials are added to Ambari credentials store. You can follow this Article to perform. Once KDC credentials are added. You can use the below Ambari's REST API to regenerate keytabs. curl -H "X-Requested-By:ambari" -u <Ambari_Admin_username>:<Amabri_Admin_password> -X PUT -d '{ "Clusters": { "security_type" : "KERBEROS" } }' http://<Ambari_HOST>:8080/api/v1/clusters/<Cluster_Name>/?regenerate_keytabs=all Example : curl -H "X-Requested-By:ambari" -u admin:admin -X PUT -d '{ "Clusters": { "security_type" : "KERBEROS" } }' http://172.26.108.142:8080/api/v1/clusters/vinod/?regenerate_keytabs=all&ignore_config_updates=true Once the Keytabs are regenerated it requires Service restart to use the newly generated keytabs.

vbonthu · ‎04-23-2018

Problem Description Atlas uses Solr to store Lineage meta information and uses Zookeeper for co-ordination and store/maintain configuration. Due to heavy load on Zookeeper on larger cluster we need to increase timeout for ZK session for some services from default. One such config is for Ambari infra (Solr) zookeeper time on Atlas side. ERROR: java.util.concurrent.TimeoutException: Could not connect to ZooKeeper vb-hortonwork.com:2181/infra-solr within 15000 ms java.util.concurrent.TimeoutException: Could not connect to ZooKeeper vb-hortonwork.com:2181/infra-solr within 15000 ms RESOLUTION: We can increase the session timeout from default 15000 by adding below properties in custom application-properties in atlas -> config atlas.graph.index.search.solr.zookeeper-connect-timeout=60000 atlas.graph.index.search.solr.zookeeper-session-timeout=60000

vbonthu · ‎04-03-2018

@Saikiran Parepally Its been fixed in HDF-3.1. Please use nifi.web.proxy.host property to add the hosts.

vbonthu · ‎01-26-2018

Generally we are used to use chown/chmod to change permissions. When we try chown/chmod on a directory which contains some x million objects it takes very long sometime even days. So to reduce the time and make the changes in one command instead of 2 chown and chmod, you can use DistCh faster then regular chown and chmod. Below is the command to use DistCh hadoop jar /usr/hdp/current/hadoop-mapreduce-client/hadoop-extras.jar org.apache.hadoop.tools.DistCh java org.apache.hadoop.tools.DistCh [OPTIONS] <path:owner:group:permission> The values of owner, group and permission can be empty. Permission is a octal number. OPTIONS: -f <urilist_uri> Use list at <urilist_uri> as src list -i Ignore failures -log <logdir> Write logs to <logdir> Generic options supported are -conf <configuration file> specify an application configuration file -D <property=value> use value for given property -fs <local|namenode:port> specify a namenode -jt <local|resourcemanager:port> specify a ResourceManager -files <comma separated list of files> specify comma separated files to be copied to the map reduce cluster -libjars <comma separated list of jars> specify comma separated jar files to include in the classpath. -archives <comma separated list of archives> specify comma separated archives to be unarchived on the compute machines.

vbonthu · ‎11-08-2017

Short Description: Spark Hbase Connector (SHC) is currently hosted in Hortonworks repo and published as spark package. Below is simple example how to access Hbase table in Spark shell and Load the data into DataFrame. Once data is in Dataframe we can use SqlContext to run queries on the DataFrame. Article The documentation here leaves out a few pieces in order access HBase tables using SHC with spark shell. Here is the Example accessing Hbase "emp" table in Spark shell. Hbase Shell Create a simple "emp" Hbase table using Hbase shell and insert sample data create 'emp', 'personal data', 'professional data' put 'emp','1','personal data:name','raju' put 'emp','1','personal data:city','hyderabad' put 'emp','1','professional data:designation','manager' put 'emp','1','professional data:salary','50000' Once created exit Hbase shell and run spark shell providing SHC package and hbase-site.xml /usr/hdp/current/spark-client/bin/spark-shell --packages zhzhan:shc:0.0.11-1.6.1-s_2.10 --files /etc/hbase/conf/hbase-site.xml Import the required classes scala> import org.apache.spark.sql.{SQLContext, _} import org.apache.spark.sql.{SQLContext, _} scala> import org.apache.spark.sql.execution.datasources.hbase._ import org.apache.spark.sql.execution.datasources.hbase._ scala> import org.apache.spark.{SparkConf, SparkContext} import org.apache.spark.{SparkConf, SparkContext} Define the Hbase schema for mapping the table, rowkey also been defined as a column (empNumber) which has a specific cf (rowkey). scala> def empcatalog = s"""{ "table":{"namespace":"default", "name":"emp"}, "rowkey":"key", "columns":{ "empNumber":{"cf":"rowkey", "col":"key", "type":"string"}, "city":{"cf":"personal data", "col":"city", "type":"string"}, "empName":{"cf":"personal data", "col":"name", "type":"string"}, "jobDesignation":{"cf":"professional data", "col":"designation", "type":"string"}, "salary":{"cf":"professional data", "col":"salary", "type":"string"} } }""".stripMargin Perform DataFrame operation on top of HBase table, First we define and then load data into Dataframe. scala> def withCatalog(empcatalog: String): DataFrame = { sqlContext .read .options(Map(HBaseTableCatalog.tableCatalog->empcatalog)) .format("org.apache.spark.sql.execution.datasources.hbase") .load() } withCatalog: (empcatalog: String)org.apache.spark.sql.DataFrame scala> val df = withCatalog(empcatalog) df: org.apache.spark.sql.DataFrame = [city: string, empName: string, jobDesignation: string, salary: string, empNumber: string] scala> df.show 17/11/08 18:04:22 INFO RecoverableZooKeeper: Process identifier=hconnection-0x55a690be connecting to ZooKeeper ensemble=vb-atlas-node1.hortonworks.com:2181,vb-atlas-node2.hortonworks.com:2181,vb-atlas-ambari.hortonworks.com:2181 17/11/08 18:04:22 INFO ZooKeeper: Client environment:zookeeper.version=3.4.6-8--1, built on 04/01/201 . . . 17/11/08 18:04:24 INFO DAGScheduler: ResultStage 0 (show at <console>:39) finished in 1.011 s 17/11/08 18:04:24 INFO DAGScheduler: Job 0 finished: show at <console>:39, took 1.230151 s +---------+-------+--------------+------+---------+ | city|empName|jobDesignation|salary|empNumber| +---------+-------+--------------+------+---------+ | chennai| ravi| manager| 50000| 1| |hyderabad| raju| engineer| null| 2| | delhi| rajesh| jrenginner| null| 3| +---------+-------+--------------+------+---------+ We can query using sqlContext on the dataframe. scala> df.registerTempTable("table") scala>sqlContext.sql("select empNumber,jobDesignation from table").show +---------+--------------+ |empNumber|jobDesignation| +---------+--------------+ | 1| manager| | 2| engineer| | 3| jrenginner| +---------+--------------+ Reference : https://hortonworks.com/blog/spark-hbase-dataframe-based-hbase-connector/ https://github.com/hortonworks-spark/shc/blob/master/examples/src/main/scala/org/apache/spark/sql/execution/datasources/hbase/HBaseSource.scala

vbonthu · ‎09-06-2017

@Nick Price Which version ?

vbonthu · ‎05-18-2017

@Juan Manuel Nieto Yes, if its kerberized environment you need to provide the keytab to authenticate. Since you are using shell-action you can use kinit too.

Online	Offline
Last Visited	‎10-17-2018 07:25 PM

Member Since	‎12-02-2015 06:43 PM
Last Visited	‎10-17-2018 07:25 PM
Posts	42
Kudos received	28

Cloudera Community

Re: When is NiFi v1.6 is planned to release as par...

Re: HBase ACL's and snapshots

Re: Minimal executable jar based on Scala code pac...

Setting up Ambari Alert for HiveServer2 Establishe...

How to remove Cluster from DPS after adding it.

How to update Data Plane App logging

Regenerating Kerberos Keytabs from Ambari API

Atlas Solr Zookeeper timeout configs

Re: When is NiFi v1.6 is planned to release as par...

Changing permission and access using DistCh

Accessing Hbase tables and querying on Dataframes ...

Re: Hive queries blocked - Tez always shows runni...

Re: Lauch spark job from oozie shell-action in ker...