Member since
12-02-2015
42
Posts
28
Kudos Received
3
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
765 | 04-03-2018 03:30 AM | |
482 | 04-25-2017 09:27 PM | |
3447 | 03-22-2017 04:45 PM |
06-27-2018
11:54 PM
1 Kudo
Current version of Ambari doesn't monitor Number of Hiveserver2 connections. We often see Hiveserver2 slowness due to heavy load as part increase in number connections to Hiveserver2. Setting up a alert for Hiveserver2 established connections will help us to take required actions like adding additional Hiveserver2 services, proper load balancing or scheduling the jobs. NOTE : Please go through this article https://github.com/apache/ambari/blob/2.6.2-maint/ambari-server/docs/api/v1/alert-definitions.md to understand Ambari Alert Definition Please find the python script and .json file used below in the attachments.
alert_hiveserver_num_connection.py - Is the python script that finds the current established connection for each Hiveserver2 and based on number of connection it returns 'CRITICAL', 'WARN', 'OK’ alerths.json - Is the Ambari Alert definition Below are the steps to setup the Ambari Alert on Hiveserver2 Established connections. Step 1 - Place the file “alert_hiveserver_num_connection.py" in the following path on the ambari-server : "/var/lib/ambari-server/resources/common-services/HIVE/0.12.0.2.0/package/alerts/ " [root@vb-atlas-ambari tmp]# cp alert_hiveserver_num_connection.py /var/lib/ambari-server/resources/common-services/HIVE/0.12.0.2.0/package/alerts/ Step 2 - Restart Ambari Server, to force Ambari agents to pull alert_hiveserver_num_connection.py python script to every host. ambari-server restart Once Ambari Server is restarted , we can verify if alert_hiveserver_num_connection.py is available in " /var/lib/ambari-agent/cache/common-services/HIVE/0.12.0.2.0/package/alerts/ " location on Hiveserver2 host.
Note : Some time it takes longer for Ambari agent to pull the script from Ambari server. [root@vb-atlas-node1 ~]# ll /var/lib/ambari-agent/cache/common-services/HIVE/0.12.0.2.0/package/alerts/
total 116
-rw-r--r--. 1 root root 9740 Jun 27 17:01 alert_hive_interactive_thrift_port.py
-rw-r--r--. 1 root root 7893 Jun 27 17:01 alert_hive_interactive_thrift_port.pyo
-rw-r--r--. 1 root root 9988 Jun 27 17:01 alert_hive_metastore.py
-rw-r--r--. 1 root root 9069 Jun 27 17:01 alert_hive_metastore.pyo
-rw-r--r--. 1 root root 1888 Jun 27 17:01 alert_hiveserver_num_connection.py
-rw-r--r--. 1 root root 11459 Jun 27 17:01 alert_hive_thrift_port.py
-rw-r--r--. 1 root root 9362 Jun 27 17:01 alert_hive_thrift_port.pyo
-rw-r--r--. 1 root root 11946 Jun 27 17:01 alert_llap_app_status.py
-rw-r--r--. 1 root root 9339 Jun 27 17:01 alert_llap_app_status.pyo
-rw-r--r--. 1 root root 8886 Jun 27 17:01 alert_webhcat_server.py
-rw-r--r--. 1 root root 6563 Jun 27 17:01 alert_webhcat_server.pyo
Step 3 - Post the Alert Definition (alerths.json) to Ambari using curl curl -u <Ambari_admin_username>:<Amabri_admin_password> -i -h 'X-Requested-By:ambari' -X POST -d @alerths.json http://<AMBARI_HOST>:<AMBARI_PORT>/api/v1/clusters/<CLUSTER_NAME>/alert_definitions Example : [root@vb-atlas-ambari tmp]# curl -u admin:admin -i -H 'X-Requested-By:ambari' -X POST -d @alerths.json http://172.26.108.142:8080/api/v1/clusters/vinod/alert_definitions
HTTP/1.1 201 Created
X-Frame-Options: DENY
X-XSS-Protection: 1; mode=block
X-Content-Type-Options: nosniff
Cache-Control: no-store
Pragma: no-cache
Set-Cookie: AMBARISESSIONID=10f33laf224yy1834ygq9cekbo;Path=/;HttpOnly
Expires: Thu, 01 Jan 1970 00:00:00 GMT
User: admin
Content-Type: text/plain
Content-Length: 0
We should be able to see the Alert in Ambari -> Alerts ( HiveServer2 Established Connections) Alternative we can also see the "HiveServer2 Established Connections" listed in Alert definitions “ http://<AMBARI_HOST>:<AMBARI_PORT>/api/v1/clusters/<CLUSTER_NAME>/alert_definitions “ Step 4 - As per Alert Definition (alerths.json) CRITICAL alert is set to 50 and WARNING is set to 30 connections by default. You can update the values directly from Ambari by editing the values.
... View more
- Find more articles tagged with:
- ambari-alerts
- ambari-server
- hiveserver2
- How-ToTutorial
- monitoring
- solutions
Labels:
06-14-2018
05:35 PM
2 Kudos
In DPS-1.1.0 We can't remove cluster from DPS UI. We can use the curl to remove the cluster. Note : User with Dataplane Admin Role can perform the below. screen-shot-2018-06-14-at-102253-am.png To delete smayani-hdp cluster Step1:- Find the cluster ID, which you want to remove. You can use the developers tools from the browser to find the cluster ID. screen-shot-2018-06-14-at-101629-am.png From above example smayani-hdp cluster ID is: 3 ( https://172.26.125.109/api/lakes/3/servicesDetails ) Step 2 :- From the console use the below curl to remove the cluster. curl -k -u <username>:<Password> -X DELETE https://<DPS_HOST>/api/lakes/<cluster_ID>; Example : curl -k -u admin:kjncsadasdcsdc -X DELETE https://172.26.125.109/api/lakes/3 Once the above is executed you should no longer see the cluster in UI. screen-shot-2018-06-14-at-102509-am.png Alternative You can also use rm_dp_cluster.sh in /usr/dp/current/core/bin on DPS installed server. Usage: ./rm_dp_cluster.sh DP_JWT HADOOP_JWT DP_HOST_NAME CLUSTER_NAME DATA_CENTER_NAME DP_JWT: Value of dp_jwt cookie from a valid user's browser session HADOOP_JWT: Value of hadoop-jwt cookie from a valid user's browser session
DP_HOST_NAME: Hostname or IP address of the DataPlane server CLUSTER_NAME: Name of the cluster to delete
DATA_CENTER_NAME: Name of the datacenter cluster belongs to You can use developers tool to use find the cookies ( DP_JWT, HADOOP_JWT)
... View more
- Find more articles tagged with:
- dps
- Governance & Lifecycle
- hdp-2.3.4
- How-ToTutorial
Labels:
06-11-2018
03:37 AM
@Abhiram Gattamaneni HDP distribution doesn't support Mapr Filesystem not it have Mapr class jars. The above error is nothing to do with Kerborized env. My suggestion will be having Mapr NFSmount on client node and use hadoop copyFromLocal and copy the file to HDFS and vise versa.
... View more
06-06-2018
09:35 PM
OBJECTIVE: Updating the log configs of DPS App. Example default log file is set to logs/application.log which can be changed or Updating the log level to DEBUG for troubleshooting. Since DP App will be running in docker we can use docker commands to update them. STEPS: 1. Find the docker container running DP App on the host running DPS. Use "docker ps" [root@dps-node ~]#docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
abd412417907 hortonworks/dlm-app:1.1.0.0-41 "runsvdir /etc/sv" 28 hours ago Up 2 hours 9011/tcp dlm-app
62620e578e31 hortonworks/dp-app:1.1.0.0-390 "/bootstrap.sh" 2 days ago Up 16 minutes 0.0.0.0:80->80/tcp, 0.0.0.0:443->443/tcp, 9000/tcp dp-app
38dda17dfdf4 hortonworks/dp-cluster-service:1.1.0.0-390 "./docker_service_st…" 2 days ago Up 2 days 9009-9010/tcp
Copy the container ID, from above example it is "62620e578e31" 2. Get the current logback.xml file [root@dps-node ~]# docker exec -it 62620e578e31 /bin/cat /usr/dp-app/conf/logback.xml > logback.xml 3. Update the configs in local logback.xml which we redirected in above command. In below I have updated the location from default logs/application.logs to /usr/dp-app/logs/. <configuration>
<appender name="FILE" class="ch.qos.logback.core.rolling.RollingFileAppender">
<file>/usr/dp-app/logs/application.log</file>
<rollingPolicy class="ch.qos.logback.core.rolling.TimeBasedRollingPolicy">
<!-- Daily rollover with compression -->
.
.
<appender name="AKKA" class="ch.qos.logback.core.rolling.RollingFileAppender">
<file>/usr/dp-app/logs/akka.log</file>
<rollingPolicy class="ch.qos.logback.core.rolling.TimeBasedRollingPolicy">
.
.
.
</encoder>
</appender>
<appender name="ACCESS_FILE" class="ch.qos.logback.core.rolling.RollingFileAppender">
<file>/usr/dp-app/logs/access.log</file>
<rollingPolicy class="ch.qos.logback.core.rolling.TimeBasedRollingPolicy">
.
.
We can also update the log level . <root level="DEBUG">
<appender-ref ref="FILE"/>
</root>
4. If needed make a backup of the original logback.xml file and cp the updated logback.xml [root@dps-node ~]#docker exec -it 62620e578e31 /bin/cp /usr/dp-app/conf/logback.xml /usr/dp-app/conf/logback.xml.bck
[root@dps-node ~]# docker exec -i 62620e578e31 tee /usr/dp-app/conf/logback.xml < logback.xml 5. Restart the docker container is required to make changes effective. [root@dps-node ~]# docker restart 62620e578e31 6. You can verify if the changes have updated. [root@dps-node ~]# docker exec -it 62620e578e31 /bin/ls -lrt /usr/dp-app/logs
total 64
-rw-r--r-- 1 root root 0 Jun 6 20:50 access.log
-rw-r--r-- 1 root root 62790 Jun 6 21:27 application.log
... View more
- Find more articles tagged with:
- dps
- How-ToTutorial
- logs
- Sandbox & Learning
- setup
Labels:
06-04-2018
05:52 PM
1 Kudo
Short Description: Describes ways to manually regenerate keytabs for services through Ambari REST API Article Make sure KDC credentials are added to Ambari credentials store. You can follow this Article to perform. Once KDC credentials are added. You can use the below Ambari's REST API to regenerate keytabs. curl -H "X-Requested-By:ambari" -u <Ambari_Admin_username>:<Amabri_Admin_password> -X PUT -d '{ "Clusters": { "security_type" : "KERBEROS" } }' http://<Ambari_HOST>:8080/api/v1/clusters/<Cluster_Name>/?regenerate_keytabs=all Example : curl -H "X-Requested-By:ambari" -u admin:admin -X PUT -d '{ "Clusters": { "security_type" : "KERBEROS" } }' http://172.26.108.142:8080/api/v1/clusters/vinod/?regenerate_keytabs=all&ignore_config_updates=true Once the Keytabs are regenerated it requires Service restart to use the newly generated keytabs.
... View more
- Find more articles tagged with:
- ambari-api
- ambari-server
- How-ToTutorial
- Kerberos
- Security
Labels:
04-23-2018
10:19 PM
Problem Description Atlas uses Solr to store Lineage meta information and uses Zookeeper for co-ordination and store/maintain configuration. Due to heavy load on Zookeeper on larger cluster we need to increase timeout for ZK session for some services from default. One such config is for Ambari infra (Solr) zookeeper time on Atlas side. ERROR: java.util.concurrent.TimeoutException: Could not connect to ZooKeeper vb-hortonwork.com:2181/infra-solr within 15000 ms java.util.concurrent.TimeoutException: Could not connect to ZooKeeper vb-hortonwork.com:2181/infra-solr within 15000 ms RESOLUTION: We can increase the session timeout from default 15000 by adding below properties in custom application-properties in atlas -> config atlas.graph.index.search.solr.zookeeper-connect-timeout=60000 atlas.graph.index.search.solr.zookeeper-session-timeout=60000
... View more
- Find more articles tagged with:
- Atlas
- Governance & Lifecycle
- Issue Resolution
- Zookeeper
- zookeeper-slow
Labels:
04-03-2018
03:30 AM
1 Kudo
@Saikiran Parepally Its been fixed in HDF-3.1. Please use nifi.web.proxy.host property to add the hosts.
... View more
01-26-2018
06:05 PM
2 Kudos
Generally we are used to use chown/chmod to change permissions. When we try chown/chmod on a directory which contains some x million objects it takes very long sometime even days. So to reduce the time and make the changes in one command instead of 2 chown and chmod, you can use DistCh faster then regular chown and chmod. Below is the command to use DistCh hadoop jar /usr/hdp/current/hadoop-mapreduce-client/hadoop-extras.jar org.apache.hadoop.tools.DistCh java org.apache.hadoop.tools.DistCh [OPTIONS] <path:owner:group:permission>
The values of owner, group and permission can be empty.
Permission is a octal number.
OPTIONS:
-f <urilist_uri> Use list at <urilist_uri> as src list
-i Ignore failures
-log <logdir> Write logs to <logdir>
Generic options supported are
-conf <configuration file> specify an application configuration file
-D <property=value> use value for given property
-fs <local|namenode:port> specify a namenode
-jt <local|resourcemanager:port> specify a ResourceManager
-files <comma separated list of files> specify comma separated files to be copied to the map reduce cluster
-libjars <comma separated list of jars> specify comma separated jar files to include in the classpath.
-archives <comma separated list of archives> specify comma separated archives to be unarchived on the compute machines.
... View more
- Find more articles tagged with:
- HDFS
- How-ToTutorial
- Mapreduce
Labels:
11-08-2017
08:41 PM
4 Kudos
Short Description: Spark Hbase Connector (SHC) is currently hosted in Hortonworks
repo and published as spark package.
Below is simple example how to access Hbase table in Spark shell and Load the data into DataFrame. Once data is
in Dataframe we can use SqlContext to run queries on the DataFrame. Article The documentation here leaves out a few pieces in order access HBase tables using SHC with spark shell. Here is the Example accessing Hbase "emp" table in Spark shell. Hbase Shell Create a simple "emp" Hbase table using Hbase shell and insert sample data create 'emp', 'personal data', 'professional data'
put 'emp','1','personal data:name','raju'
put 'emp','1','personal data:city','hyderabad'
put 'emp','1','professional data:designation','manager'
put 'emp','1','professional data:salary','50000'
Once created exit Hbase shell and run spark shell providing SHC package and hbase-site.xml /usr/hdp/current/spark-client/bin/spark-shell --packages zhzhan:shc:0.0.11-1.6.1-s_2.10 --files /etc/hbase/conf/hbase-site.xml Import the required classes scala> import org.apache.spark.sql.{SQLContext, _}
import org.apache.spark.sql.{SQLContext, _}
scala> import org.apache.spark.sql.execution.datasources.hbase._
import org.apache.spark.sql.execution.datasources.hbase._
scala> import org.apache.spark.{SparkConf, SparkContext}
import org.apache.spark.{SparkConf, SparkContext}
Define the Hbase schema for mapping the table, rowkey
also been defined as a column (empNumber) which has a specific
cf (rowkey). scala> def empcatalog = s"""{
"table":{"namespace":"default", "name":"emp"},
"rowkey":"key",
"columns":{
"empNumber":{"cf":"rowkey", "col":"key", "type":"string"},
"city":{"cf":"personal data", "col":"city", "type":"string"},
"empName":{"cf":"personal data", "col":"name", "type":"string"},
"jobDesignation":{"cf":"professional data", "col":"designation", "type":"string"},
"salary":{"cf":"professional data", "col":"salary", "type":"string"}
}
}""".stripMargin
Perform DataFrame operation on top of HBase table, First we define and then load data into Dataframe. scala> def withCatalog(empcatalog: String): DataFrame = {
sqlContext
.read
.options(Map(HBaseTableCatalog.tableCatalog->empcatalog))
.format("org.apache.spark.sql.execution.datasources.hbase")
.load()
}
withCatalog: (empcatalog: String)org.apache.spark.sql.DataFrame
scala> val df = withCatalog(empcatalog)
df: org.apache.spark.sql.DataFrame = [city: string, empName: string, jobDesignation: string, salary: string, empNumber: string]
scala> df.show
17/11/08 18:04:22 INFO RecoverableZooKeeper: Process identifier=hconnection-0x55a690be connecting to ZooKeeper ensemble=vb-atlas-node1.hortonworks.com:2181,vb-atlas-node2.hortonworks.com:2181,vb-atlas-ambari.hortonworks.com:2181
17/11/08 18:04:22 INFO ZooKeeper: Client environment:zookeeper.version=3.4.6-8--1, built on 04/01/201
.
.
.
17/11/08 18:04:24 INFO DAGScheduler: ResultStage 0 (show at <console>:39) finished in 1.011 s
17/11/08 18:04:24 INFO DAGScheduler: Job 0 finished: show at <console>:39, took 1.230151 s
+---------+-------+--------------+------+---------+
| city|empName|jobDesignation|salary|empNumber|
+---------+-------+--------------+------+---------+
| chennai| ravi| manager| 50000| 1|
|hyderabad| raju| engineer| null| 2|
| delhi| rajesh| jrenginner| null| 3|
+---------+-------+--------------+------+---------+
We can query using sqlContext on the dataframe. scala> df.registerTempTable("table")
scala>sqlContext.sql("select empNumber,jobDesignation from table").show
+---------+--------------+
|empNumber|jobDesignation|
+---------+--------------+
| 1| manager|
| 2| engineer|
| 3| jrenginner|
+---------+--------------+
Reference : https://hortonworks.com/blog/spark-hbase-dataframe-based-hbase-connector/ https://github.com/hortonworks-spark/shc/blob/master/examples/src/main/scala/org/apache/spark/sql/execution/datasources/hbase/HBaseSource.scala
... View more
- Find more articles tagged with:
- Data Processing
- HBase
- How-ToTutorial
- Spark
- spark-shell
Labels:
09-15-2017
06:28 PM
@Sree Kupp hard to explain without datanode and namenode logs. Check the datanode logs.
... View more
09-14-2017
11:01 PM
@Sree Kupp The above just means the client disconnect after finish writing. The issue has been fixed in latest HDP 2.5. Can you check Namenode UI live nodes ? are they 4 ?
... View more
09-06-2017
03:25 PM
@Nick Price Which version ?
... View more
09-06-2017
03:21 PM
@M K First you need to enable JMX for Kafka by Edit bin/kafka-run-class.sh
KAFKA_JMX_OPTS="-Dcom.sun.management.jmxremote=true -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -Djava.rmi.server.hostname=your.kafka.broker.hostname -Djava.net.preferIPv4Stack=true"
Update bin/kafka-server-start.sh add the below line
and set KAFKA_JMX_OPTS variable
export JMX_PORT=PORT Once done you can follow https://grafana.com/dashboards/721 for Kafka grafana dashboard metrics.
... View more
05-18-2017
03:02 PM
@Juan Manuel Nieto Yes, if its kerberized environment you need to provide the keytab to authenticate. Since you are using shell-action you can use kinit too.
... View more
05-09-2017
07:58 PM
Hi @Gilsomar N Resende Did you install Hbase and Ambari infra (Solr) on the cluster before Atlas?
... View more
05-05-2017
03:30 PM
@Saba Baig are you using apache hadoop ? or HDP ? To get the whole hadoop classpath, you just need to run the following command on your machine. You can copy that pass it to HADOOP_CLASSPATH hadoop classpath Example :[root@vb-atlas-node2 ~]# hadoop classpath
/usr/hdp/2.6.0.3-8/hadoop/conf:/usr/hdp/2.6.0.3-8/hadoop/lib/*:/usr/hdp/2.6.0.3-8/hadoop/.//*:/usr/hdp/2.6.0.3-8/hadoop-hdfs/./:/usr/hdp/2.6.0.3-8/hadoop-hdfs/lib/*:/usr/hdp/2.6.0.3-8/hadoop-hdfs/.//*:/usr/hdp/2.6.0.3-8/hadoop-yarn/lib/*:/usr/hdp/2.6.0.3-8/hadoop-yarn/.//*:/usr/hdp/2.6.0.3-8/hadoop-mapreduce/lib/*:/usr/hdp/2.6.0.3-8/hadoop-mapreduce/.//*::mysql-connector-java-5.1.17.jar:mysql-connector-java.jar:/usr/hdp/2.6.0.3-8/tez/*:/usr/hdp/2.6.0.3-8/tez/lib/*:/usr/hdp/2.6.0.3-8/tez/conf
To install hadoop client and hive client you can go to Ambari and select hosts -> select the atlas hostname -> components -> Add ( hive client, hadoop client )
... View more
05-04-2017
04:54 PM
@Saba Baig
Try installing hive client on the atlas server and try using import-hive.sh from atlas server. Hive metadata is imported using import-hive.sh command, The script needs Hadoop and Hive classpath jars. * For Hadoop jars, please make sure that the environment variable HADOOP_CLASSPATH is set. Another way is to set HADOOP_HOME to point to root directory of your Hadoop installation * Similarly, for Hive jars, set HIVE_HOME to the root of Hive installation * Set environment variable HIVE_CONF_DIR to Hive configuration directory * Copy <atlas-conf>/atlas-application.properties to the hive conf directory
... View more
05-04-2017
04:51 PM
@Geoffrey Shelton Okot by default ambari will take care of this, if install using ambari.
... View more
05-04-2017
04:49 PM
@Geoffrey Shelton Okot Try this . Looks like a typo --allow-principals its should be --allow-principal /usr/hdp/current/kafka-broker/bin/kafka-acls.sh --topic ATLAS_HOOK --allow-principal user:<atlas_user> --operations All --authorizer-properties "zookeeper.connect=gateway-maxwell.com:2181,namenode-maxwell.com:2181,namenode-maxwell.com:2181"
... View more
04-26-2017
04:45 PM
@Yanying Gu Can you check if Ambari Infra ( Solr) is up ? If yes since it kerberos env you take spnego tgt and can you try to access Solr UI ?
... View more
04-25-2017
09:27 PM
@Ward Bekker The AccessController Coprocessor for HBase, only a global administrator can take, clone, or restore a snapshot, and these actions do not capture the ACL rights. This means that restoring a table preserves the ACL rights of the existing table, and cloning a table creates a new table that has no ACL rights until the administrator adds them.
... View more
04-25-2017
06:18 PM
@HadoopAdmin India Answering your first question. What I found is both x and y are able to see all the columns and tag based policy does not seem to work. --> This is because if any one ranger policy satisfy/grants permissions to user x and y, they will be able to access both x and y data. Since you have created ranger policy in first place giving access to both x and y thats giving access for both x and y to access for all columns. Try removing ranger policy only y user will be able to access that column. Question2: What is the use of AD integration in Atlas? How AD users are used in Atlas? --> you can sync your AD users directly to access Atlas UI and to track data governance. Question3: What is hive hook and can some one provide more information on it. --> Atlas Hive hook is used by Hive to support listeners on hive command execution using hive hooks. This is used to add/update/remove entities in Atlas using the model defined in org.apache.atlas.hive.model.HiveDataModelGenerator. The hook submits the request to a thread pool executor to avoid blocking the command execution. The thread submits the entities as message to the notification server and atlas server reads these messages and registers the entities. Follow these instructions in your hive set-up to add hive hook for Atlas: Question4: How to create geo-based policy and time-based policy using Atlas? ---> As per I know currently we can only integrate your tag sync policies into Atlas.
... View more
04-25-2017
05:39 PM
@Shashidhar Janne
You can use Java's util.Date then
val format = new java.text.SimpleDateFormat("yyyy-MM-dd")
format.parse("2013-07-06")
and good to see https://github.com/nscala-time/nscala-time/blob/master/README.md
... View more
03-23-2017
04:44 PM
awesome @Ken Jiiii hive-site.xml should be available across the cluster in /etc/spark/conf ( where /usr/hdp/current/spark-client/conf will be symlink to) and spark client need to be installed across the cluster worker nodes for your yarn-cluster mode to run as your spark driver can run on any worker node and should be having client installed with spark/conf. If you are using Ambari it will taking care of hive-site.xml available in /spark-client/conf/
... View more
03-22-2017
04:45 PM
1 Kudo
Thanks @Ken Jiiii Looking at your error, application master failed 2 times due to exit code 15, Did you check your /spark/conf if you have placed hive-site.xml and in your code can you try removing " .setMaster("local[2]") " as you are running on yarn. try running it spark-submit --class com.test.spark.Test --master yarn-cluster hdfs://HDP25/test.jar
... View more
03-21-2017
08:57 PM
2 Kudos
@Ken Jiiii You can follow this link where you have example and pom.xml. and answering to questions " I do not need any cluster or Hortonworks specific things in my pom, right? " Yes you dont need. All those values should be in you code or client configs ( core-site.xml, yarn-site.xml )
... View more
03-11-2017
02:22 AM
2 Kudos
1. You can update the share lib with the following jars or can be directly passed in oozie workflow.xml. ( Make sure you use 3.2 version not 4.x datanucleus jars ) /usr/hdp/current/spark-client/lib/datanucleus-rdbms-3.2.9.jar /usr/hdp/current/spark-client/lib/datanucleus-core-3.2.10.jar /usr/hdp/current/spark-client/lib/datanucleus-api-jdo-3.2.6.jar
To copy jars to sharelib #hdfs dfs –put /usr/hdp/current/spark-client/lib/datanucleus-rdbms-3.2.9.jar /user/oozie/share/lib/lib_*/spark/ If you copy the jars to sharelib make sure run oozie sharelibupdate
Update oozie sharelib: # oozie admin -oozie http://<oozie-server>:11000/oozie -sharelibupdate
Verify the current spark action sharelib with all the above files: # oozie admin -oozie http://<oozie-server>:11000/oozie -shareliblist spark*
Make sure you have hive-site.xml in sharelib too and have the following properties in it. Replace the values with your hive-site.xml values. <configuration>
<property>
<name>hive.metastore.kerberos.keytab.file</name>
<value>/etc/security/keytabs/hive.service.keytab</value>
</property>
<property>
<name>hive.metastore.kerberos.principal</name>
<value>hive/_HOST@SANDBOX.COM</value>
</property>
<property>
<name>hive.metastore.sasl.enabled</name>
<value>true</value>
</property>
<property>
<name>hive.metastore.uris</name>
<value>thrift://vb-atlas-node1.hortonworks.com:9083</value>
</property>
<property>
<name>hive.server2.authentication</name>
<value>KERBEROS</value>
</property>
<property>
<name>hive.server2.authentication.kerberos.keytab</name>
<value>/etc/security/keytabs/hive.service.keytab</value>
</property>
<property>
<name>hive.server2.authentication.kerberos.principal</name>
<value>hive/_HOST@SANDBOX.COM</value>
</property>
<property>
<name>hive.server2.authentication.spnego.keytab</name>
<value>/etc/security/keytabs/spnego.service.keytab</value>
</property>
<property>
<name>hive.server2.authentication.spnego.principal</name>
<value>HTTP/_HOST@SANDBOX.COM</value>
</property>
<property>
<name>hive.metastore.warehouse.dir</name>
<value>/apps/hive/warehouse</value>
</property>
<property>
<name>hive.metastore.cache.pinobjtypes</name>
<value>Table,Database,Type,FieldSchema,Order</value>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
</property>
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://vb-atlas-node1.hortonworks.com/hive?createDatabaseIfNotExist=true</value>
</property>
</configuration>
2. Create a workflow.xml, please make sure you replace the Metastore url and jar's location. <workflow-app name="spark-wf" xmlns="uri:oozie:workflow:0.5"> <credentials> <credential name='hcat_auth' type='hcat'> <property> <name>hcat.metastore.uri</name> <value>thrift://vb-atlas-node1.hortonworks.com:9083</value> </property> <property> <name>hcat.metastore.principal</name> <value>hive/_HOST@SANDBOX.COM</value> </property> </credential> </credentials> <start to="spark-action"/> <action name="spark-action" cred='hcat_auth'> <spark xmlns="uri:oozie:spark-action:0.1"> <job-tracker>${jobTracker}</job-tracker> <name-node>${nameNode}</name-node> <prepare> <delete path="${nameNode}/user/spark/sparkOozie/output-data/spark"/> </prepare> <master>${master}</master> <name>Spark Hive Example</name> <class>com.hortonworks.vinod.SparkSqlExample</class> <jar>${nameNode}/user/{User_You_run _as}/lib/Spark-Example-vinod-0.0.1-SNAPSHOT.jar</jar> <spark-opts>--driver-memory 512m --executor-memory 512m --num-executors 1 --jars /usr/hdp/current/spark-client/lib/datanucleus-rdbms-3.2.9.jar,/usr/hdp/current/spark-client/lib/datanucleus-core-3.2.10.jar,/usr/hdp/current/spark-client/lib/datanucleus-api-jdo-3.2.6.jar --files /usr/hdp/current/spark-client/conf/hive-site.xml</spark-opts> <arg>thrift://vb-atlas-node1.hortonworks.com:9083</arg> </spark> <ok to="end"/> <error to="kill"/> </action> <kill name="kill"> <message>Action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message> </kill> <end name="end"/> </workflow-app> 3. Upload the jars to run the program and the input file to home path as which you run the oozie job as. # hdfs dfs -put Spark-Example-vinod-0.0.1–SNAPSHOT.jar /user/{User_You_run _as}/lib/Spark-Example-vinod-0.0.1–SNAPSHOT.jar #hdfs dfs –put input.txt /user/{User_You_run _as}/ - Upload workflow.xml to HDFS: For example: # hdfs dfs -put workflow.xml /user/{User_You_run _as}/ 4. Config the job.properties and run the job. nameNode:hdfs://<namenode_HOST>:8020 jobTracker= <Resource_Manager.:8050 oozie.wf.application.path=/user/{User_You_run _as}/ oozie.use.system.libpath=true master=yarn-cluster 5. Run the oozie job with the properites: # oozie job -oozie http://<oozie-server>:11000/oozie/ -config job.properties -run
You should be seeing Spark Hive Example in Resource Manager and output will be in std.out Log Type: stdout Log Upload Time: Fri Mar 10 22:30:16 +0000 2017 Log Length: 99 +---+-------+ | id| name| +---+-------+ | 1|sample1| | 2|sample2| | 3|sample3| +---+———+ 6 == com.hortonworks.vinod.SparkSqlExample.class === package com.hortonworks.vinod;
import java.io.IOException;
import javax.security.auth.login.Configuration;
import org.apache.spark.SparkConf;
import org.apache.spark.SparkContext;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.spark.sql.DataFrame;
import org.apache.spark.sql.hive.HiveContext;
public class SparkSqlExample {
public static void main(String[] args) throws IOException {
org.apache.hadoop.conf.Configuration conf = new org.apache.hadoop.conf.Configuration();
conf.addResource("/etc/hadoop/conf/core-site.xml");
conf.addResource("/etc/hadoop/conf/hdfs-site.xml");
conf.addResource("/etc/hive/conf/hive-site.xml");
FileSystem fs = FileSystem.get(conf);
SparkConf sparkConf = new SparkConf().setAppName("JavaSparkSQL");
SparkContext sss = new SparkContext(sparkConf);
// JavaSparkContext ctx = new JavaSparkContext(sparkConf);
HiveContext hivecontex = new HiveContext(sss);
hivecontex.sql("create external table if not exists SparkHiveExample ( id int, name string) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS TextFile");
hivecontex.sql("LOAD DATA INPATH 'input.txt' OVERWRITE INTO TABLE SparkHiveExample");
DataFrame df = hivecontex.sql("select * from SparkHiveExample");
df.show();
}
} 7 pom.xml <projectxmlns="http://maven.apache.org/POM/4.0.0"xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>com.hortonworks.sparkExample</groupId>
<artifactId>Spark-Example-vinod</artifactId>
<version>0.0.1-SNAPSHOT</version>
<name>Spark Examples</name>
<description>Spark programs </description>
<parent>
<groupId>org.apache.spark</groupId>
<artifactId>spark-parent_2.10</artifactId>
<version>1.6.2</version>
</parent>
<dependencies>
<!-- https://mvnrepository.com/artifact/org.apache.spark/spark-core_2.10 -->
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.10</artifactId>
<version>1.6.0</version>
</dependency>
<!-- https://mvnrepository.com/artifact/org.apache.spark/spark-hive_2.10 -->
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-hive_2.10</artifactId>
<version>1.6.1</version>
</dependency>
<!-- https://mvnrepository.com/artifact/org.scala-lang/scala-library -->
<dependency>
<groupId>org.scala-lang</groupId>
<artifactId>scala-library</artifactId>
<version>2.10.6</version>
</dependency>
<!-- https://mvnrepository.com/artifact/org.apache.hadoop/hadoop-mapreduce-client-core -->
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-mapreduce-client-core</artifactId>
<version>2.7.0</version>
</dependency>
</dependencies>
</project>
... View more
- Find more articles tagged with:
- Data Science & Advanced Analytics
- How-ToTutorial
- oozie-spark
- spark-sql
Labels:
02-13-2017
09:01 PM
12 Kudos
As we know many services like Atlas for lineage, Ranger for audit logs, log search and so on uses Ambari Infra (Solr) for indexing data. So moving Ambari Infra in production and keeping it stable and up is really important. This are the key points I came up with to make this happen Hardware – Try to have minimum of 3 Ambari infra nodes with atleast 1-2TB disk for Solr data storage, but mainly depends on how many components ( like Ranger, Atlas , Log search.. ) and amount of data will feed into Solr of indexing. A major driving factor for Solr performance is RAM. Solr requires sufficient memory for two separate things: One is the Java heap, the other is free memory for the OS disk cache. Let's say that you have a Solr index size of 8GB. If your OS, Solr's Java heap, and all other running programs require 4GB of memory, then an ideal memory size for that server is at least 12GB. So How much memory do I need for Ambari Infra?? This is one of those questions that has no generic answer. You want a heap that's large enough so that you don't have OOM exceptions and problems with constant garbage collection, but small enough that you're not wasting memory or running into huge garbage collection pauses. So ideally we can start with 8GB total memory (leaving 4GB for disk cache)initially, but that also might NOT be enough. The really important thing is to ensure that there is a high cache hit ratio on the OS disk cache. GC - GC pauses usually caused by full garbage collections i.e pause all program execution to clean up memory. GC tuning is an art form, and what works for one person may not work for you. Using the ConcurrentMarkSweep (CMS) collector with tuning parameters is a very good option for for Solr, but with the latest Java 7 releases (7u72 at the time of this writing), G1 is looking like a better option, if the -XX:+ParallelRefProcEnabled option is used. Information from Oracle engineers who specialize in GC indicates that the latest Java 8 will noticeably improve G1 performance over Java 7, but that has not been confirmed. Here are some ideas that hopefully you will find helpful:
The "MaxNewSize" should not be low, because the applications use caches setting it low value will cause the temporary cache data to me moved to Old Generation prematurely / so quickly. Once the objects are moved to Old gen then only during the complete GC face they will get cleared and till that time they will be present in the heap space. We should set the "MaxNewSize" (young generation heap size) to atleaset 1/6 (recommended) or 1/8 of the MaxHeap in genaral. If our application creates much more temporary objects (short lived) cached then the MaxNewSize can be further increased. Example : -Xmx8192m –Xms8192m –XX:MaxNewSize=1365m
Because normally the throughput Collector starts a GC cycle only when the heap is full (or reaches max), In order to finish a GC cycle before the application runs out of memory (or max memory), the CMS Collector needs to start a GC cycle much earlier than the throughput collector by setting -XX:CMSInitiatingOccupancyFraction=65 -XX:+UseCMSInitiatingOccupancyOnly this will help in reducing the long GC pause. Because it will help the JVM to more proactively clean the heap when it reaches to 65% instead of waiting for it to be filled 90% and above. Zookeeper – As we know Solr uses Zookeeper to manage configs and co-ordination. Solr doesn’t use zookeeper that intensively when compared to other services(Kafka, services HA..). Since SolrCloud relies on Zookeeper, it can be very unstable if you have underlying performance issues that result in operations taking longer than the zkClientTimeout. Increasing that timeout can help, but addressing the underlying performance issues will yield better results. The default timeout 30 sec should be more than enough for a well-tuned SolrCloud. As we always strongly recommend storing the Zookeeper data on separate physical disks form other services and OS. Having dedicated machines when we have multiple services using ZK is even better, but not a requirement Availability - Having multiple shards with replication helps to keep the solr collections available in most of the cases like nodes going down. By default most of the collection are created with 1 shard and 1 replica. We can use the following commands to split the shard or recreate the collection with multiple shards. Example Ranger Audit log, we can split the existing shard or recreate the collection. If its a new install/initial stages I would delete and recreate the collection. To delete ranger_audits collection http://vb-atlas-ambari.hortonworks.com:8886/solr/admin/collections?action=delete&name=ranger_audits If you don’t have Solr UI enable or access you can use spnego principal and run the below command from command-line curl -i --negotiate -u : “http://vb-atlas-ambari.hortonworks.com:8886/solr/admin/collections?action=delete&name=ranger_audits" create new ranger_audits http://vb-atlas-ambari.hortonworks.com:8886/solr/admin/collections?action=create&name=ranger_audits& numShards=3&replicationFactor=2&collection.configName=ranger_audits Or from command line curl -i --negotiate -u : "http://vb-atlas-ambari.hortonworks.com:8886/solr/admin/collections?action=create&name=ranger_audits& numShards=3&replicationFactor=2&collection.configName=ranger_audits" You can also provide solr nodes where your shards can land in http://vb-atlas-ambari.hortonworks.com:8886/solr/admin/collections?action=create&name=ranger_audits& numShard=3&replicationFactor=2&collection.configName=ranger_audits&createNodeSet=xhadambum1p.hortonworks.com:8886/solr,xhadambum2p.hortonworks.com:8886/solr,xhadambum3p.hortonworks.com:8886/solr NOTE: Since we are using same collection.configName we don’t need to provide configs again for collection. Split Shard The below command split the shard1 into 2 shards shard1_0 and shard1_1 http://vb-atlas-ambari.hortonworks.com:8886/solr/admin/collections?collection=ranger_audit&shard=shard1&action=SPLITSHARD Disk Space Some time having high expiration for documents can fill up the disk space in case of heavy traffic. So configuring the right TTL can eliminate this kind of disk space alerts. Example by default ranger_audits have 90days ttl this can be changed if needed. If you haven't used Solr Audits before and haven't enabled Ranger Audits to Solr via Ambari yet, it will be easy to adjust the TTL configuration. By default ranger has its solrconfig.xml in /usr/hdp/2.5.0.0-1245/ranger-admin/contrib/solr_for_audit_setup/conf/solrconfig.xml So you can directly edit the solrconfig.xml file and change +90days to the other number. --> <updateRequestProcessorChain name="add-unknown-fields-to-the-schema"> <processor> <str name="fieldName">_ttl_</str> <str name="value">+60DAYS</str> </processor> <processor> <int name="autoDeletePeriodSeconds">86400</int> <str name="ttlFieldName">_ttl_</str> <str name="expirationFieldName">_expire_at_</str> </processor> <processor> <str name="fieldName">_expire_at_</str> </processor> Afterwards, you can go to Ambari and enable Ranger Solr Audits, the collection that is going to be created will use the new setting. If you already configured Ranger audits to Solr Go to one of the Ambari Infra nodes that hosts a Solr Instance. You can download the solrconfig.xml or change the existing one of the component you have To download /usr/lib/ambari-infra-solr/server/scripts/cloud-scripts/zkcli.sh -cmd getfile /infra-solr/configs/ranger_audits/solrconfig.xml solrconfig.xml -z vb-atlas-ambari.hortonworks.com:2181 Edit the downloaded solrconfig.xml and change the ttl Upload the config back to Zookeeper /usr/lib/ambari-infra-solr/server/scripts/cloud-scripts/zkcli.sh -cmd putfile /infra-solr/configs/ranger_audits/solrconfig.xml solrconfig.xml -z vb-atlas-ambari.hortonworks.com:2181 Reload the config http://vb-atlas-ambari.hortonworks.com:8886/solr/admin/collections?action=RELOAD&name=ranger_audits Or form command line curl -v --negotiate -u : "http://vb-atlas-ambari.hortonworks.com:8886/solr/admin/collections?action=RELOAD&name=ranger_audits” Example of doc after changing ttl from +90DAYS to +60DAYS you can verify curl -i --negotiate -u : "http://vb-atlas-ambari.hortonworks.com:8886/solr/ranger_audits_shard1_replica1/select?q=_ttl_%3A%22%2B60DAYS%22%0A&wt=json&indent=true" or from solr query UI have q as _ttl_:"+60DAYS" p.p1 {margin: 0.0px 0.0px 0.0px 0.0px; font: 13.0px Courier; -webkit-text-stroke: #000000}
span.s1 {font-kerning: none} { "responseHeader":{ "status":0, "QTime":6, "params":{ "q":"_ttl_:\"+60DAYS\"\n", "indent":"true", "wt":"json"}}, "response":{"numFound":38848,"start":0,"docs":[ { "id":"004fa587-c531-429a-89a6-acf947d93c39-70574", "access":"WRITE", "enforcer":"hadoop-acl", "repo":"vinodatlas_hadoop", "reqUser":"spark", "resource":"/spark-history/.133f95bb-655f-450f-8aea-b87288ee2748", "cliIP":"172.26.92.153", "logType":"RangerAudit", "result":1, "policy":-1, "repoType":1, "resType":"path", "reason":"/spark-history", "action":"write", "evtTime":"2017-02-08T23:08:08.103Z", "seq_num":105380, "event_count":1, "event_dur_ms":0, "_ttl_":"+60DAYS", "_expire_at_":"2017-04-09T23:08:09.406Z", "_version_":1558808142185234432}, {
... View more
- Find more articles tagged with:
- ambari-infra
- Atlas
- How-ToTutorial
- Ranger
- Sandbox & Learning
Labels:
01-01-2017
12:15 AM
Setting up Log Search SSL and HTTPS A keystore and a truststore are required for this setup. These instructions assume that you have already created .jks for the keystore and truststore 1. Create keystore location a. Keystore Setup Place the keystore in /etc/security/certs/ and else by using a symlink you can point to another location of your keystore.jks b. Ensure log search user can read the keystore chown logsearch:hadoop *.keyStore.jks 2. Create a truststore for logsearch: a. Cert signed by CA: i. Copy the keystore into <host>.trustStore.jks ii. Create a symlink to this similar to the keystore /etc/security/certs/truststore.jks -> /etc/security/certs/<host>.trustStore.jks b. Ensure log search user can read the trust store chown logsearch:hadoop *.trustStore.jks 3. Update Ambari configuration a. Update logsearch UI Protocol to https b. Update Trust store location ( logsearch_truststore_location ) and password c. Update Keystore location ( logsearch_keystore_location ) and password 4. Restart log search server UPDATE Logsearch Alert in Ambari Once the Log Search is configured to be accessed using SSL, the following steps are to be performed to update the Alert Definition of "Log Search Web UI" to check https URL. Note: Please replace the variables with appropriate values for your cluster ( Admin credentials, Ambari host and Cluster name ) 1. GET Alert Definition ID. Execute the below command, by replacing the variables with appropriate values. and search for logsearch_ui section curl -s -k -u $AMB_USER:$AMB_PASS -H 'X-Requested-By: ambari' -X GET http://<Ambari_HOST>:8443/api/v1/clusters/<CLUSTER_NAME>/alert_definitions Sample output for logsearch_ui section: { "href" : "http://sandbox.hortonworks.com:8443/api/v1/clusters/sandbox/alert_definitions/451", "AlertDefinition" : { "cluster_name" : “sandbox", "id" : 451, "label" : "Log Search Web UI", "name" : "logsearch_ui" } }, 2. GET the Alert Definition. Use the href value from the above step's sample output to get the Alert Definition of "Log Search Web UI" by executing the below command. curl -s -k -u $AMB_USER:$AMB_PASS -H 'X-Requested-By: ambari' -X GET http://sandbox.hortonworks.com:8443/api/v1/clusters/sandbox/alert_definitions/451 Sample Output: { "href" : "http://sandbox.hortonworks.com.com:8443/api/v1/clusters/sandbox/alert_definitions/451", "AlertDefinition" : { "cluster_name" : “sandbox", "component_name" : "LOGSEARCH_SERVER", "description" : "This host-level alert is triggered if the Log Search UI is unreachable.", "enabled" : true, "help_url" : null, "id" : 451, "ignore_host" : false, "interval" : 1, "label" : "Log Search Web UI", "name" : "logsearch_ui", "repeat_tolerance" : 1, "repeat_tolerance_enabled" : false, "scope" : "ANY", "service_name" : "LOGSEARCH", "source" : { "reporting" : { "critical" : { "text" : "Connection failed to {1} ({3})" }, "ok" : { "text" : "HTTP {0} response in {2:.3f}s" }, "warning" : { "text" : "HTTP {0} response from {1} in {2:.3f}s ({3})" } }, "type" : "WEB", "uri" : { "http": "{{logsearch-env/logsearch_ui_port}}", "https": "{{logsearch-env/logsearch_ui_port}}", "default_port": 61888, "connection_timeout": 5 } } } } 3. Create a temp file with new variables. Create a temp file (in this example: logsearch_uri) with below contents to update the URI sections to include https_property and https_property_value variables and values. logsearch_uri file contents: { "AlertDefinition": { "source": { "reporting": { "ok": { "text": "HTTP {0} response in {2:.3f}s" }, "warning": { "text": "HTTP {0} response from {1} in {2:.3f}s ({3})" }, "critical": { "text": "Connection failed to {1} ({3})" } }, "type": "WEB", "uri": { "http": "{{logsearch-env/logsearch_ui_port}}", "https": "{{logsearch-env/logsearch_ui_port}}", "https_property": "{{logsearch-env/logsearch_ui_protocol}}", "https_property_value": "https", "default_port": 61888, "connection_timeout": 5 } } } } 4. PUT the updated Alert Definition. Execute the below command to update the Alert Definition using logsearch_uri file created in the previous step. There will be no output to displayed after the execution of this command. curl -s -k -u $AMB_USER:$AMB_PASS -H 'X-Requested-By: ambari' -X PUT -d @logsearch_uri http://sandbox.hortonworks.com:8443/api/v1/clusters/sandbox/alert_definitions/451 5. Validate the update Execute again the get Alert Definition command (as below) and verify the https_property and https_propert_value are now part of uri section. curl -s -k -u $AMB_USER:$AMB_PASS -H 'X-Requested-By: ambari' -X GET http://sandbox.hortonworks.com:8443/api/v1/clusters/sandbox/alert_definitions/451 Sample Output: { "href" : "http://sandbox.hortonworks.com:8443/api/v1/clusters/sandbox/alert_definitions/451", "AlertDefinition" : { "cluster_name" : “sandbox", "component_name" : "LOGSEARCH_SERVER", "description" : "This host-level alert is triggered if the Log Search UI is unreachable.", "enabled" : true, "help_url" : null, "id" : 451, "ignore_host" : false, "interval" : 1, "label" : "Log Search Web UI", "name" : "logsearch_ui", "repeat_tolerance" : 1, "repeat_tolerance_enabled" : false, "scope" : "ANY", "service_name" : "LOGSEARCH", "source" : { "reporting" : { "critical" : { "text" : "Connection failed to {1} ({3})" }, "ok" : { "text" : "HTTP {0} response in {2:.3f}s" }, "warning" : { "text" : "HTTP {0} response from {1} in {2:.3f}s ({3})" } }, "type" : "WEB", "uri" : { "http": "{{logsearch-env/logsearch_ui_port}}", "https": "{{logsearch-env/logsearch_ui_port}}", "https_property": "{{logsearch-env/logsearch_ui_protocol}}", "https_property_value": "https", "default_port": 61888, "connection_timeout": 5 } } } } NOTE: In the first if you had disabled Alert Definition for "Log Search Web UI" in Ambari, then Enable it again, else wait for the time interval for alert check to execute.
... View more
- Find more articles tagged with:
- ambari-alerts
- Atlas
- How-ToTutorial
- Sandbox & Learning
Labels:
12-30-2016
03:35 PM
1 Kudo
SUMMARY: How to enable Performance logging in Atlas, where we can track each search time like time taken to get any entities or to get linegae info. It helps while debugging if Atlas UI or API is taking long time to get results. We can check which phase is taking long time and debug accordingly. Example : 2016-12-20 14:24:02,344|qtp1381713434-59648 - ce3e660e-bdcb-4656-805d-7a99d0b9ddb6|PERF|EntityResource.getEntityDefinition()|452 2016-12-20 14:24:02,432|qtp1381713434-59901 - d15c9039-945a-4a87-abf2-017fdde22ad6|PERF|EntityResource.getEntityDefinition()|6 2016-12-20 14:24:02,553|qtp1381713434-59893 - d9624e31-6c1f-4900-8269-e9f14dfb0a09|PERF|EntityResource.getAuditEvents(03b90ea3-a307-4cfd-ba93-79a2a7cbadf8, null, 26)|117 2016-12-20 14:24:02,643|qtp1381713434-59896 - 775b3108-49e4-4c69-af65-b028a21b26b3|PERF|LineageResource.schema(03b90ea3-a307-4cfd-ba93-79a2a7cbadf8)|207 2016-12-20 14:24:03,176|qtp1381713434-59894 - 98047e2d-181b-4a41-bdf1-4d273a4cc7a3|PERF|LineageResource.inputsGraph(03b90ea3-a307-4cfd-ba93-79a2a7cbadf8)|750 2016-12-20 14:24:03,936|qtp1381713434-59857 - 1dff70bd-03d8-4f42-a294-440cd19e4d41|PERF|LineageResource.outputsGraph(03b90ea3-a307-4cfd-ba93-79a2a7cbadf8)|732 2016-12-20 14:26:48,452|NotificationHookConsumer thread-0|PERF|EntityResource.deleteEntities()|2184 STEPS: 1. Go to Ambari -> Atlas -> Config -> Advanced -> Atlas-log4j and add the following in atlas-log4j from Ambari <appender name="perf_appender"> <param name="file" value="${atlas.log.dir}/atlas_perf.log" /> <param name="datePattern" value="'.'yyyy-MM-dd" /> <param name="append" value="true" /> <layout> <param name="ConversionPattern" value="%d|%t|%m%n" /> </layout> </appender> <logger name="org.apache.atlas.perf" additivity="false"> <level value="debug" /> <appender-ref ref="perf_appender" /> </logger> 2. Save your config changes and do the required restarts (Restart Atlas) . 3. You should be seeing performance logging in /var/log/atlas/atlas_perf.log
... View more
- Find more articles tagged with:
- Atlas
- Governance & Lifecycle
- How-ToTutorial
Labels: