About vbonthu

vbonthu · ‎04-25-2017

@Ward Bekker The AccessController Coprocessor for HBase, only a global administrator can take, clone, or restore a snapshot, and these actions do not capture the ACL rights. This means that restoring a table preserves the ACL rights of the existing table, and cloning a table creates a new table that has no ACL rights until the administrator adds them.

vbonthu · ‎04-25-2017

@HadoopAdmin India Answering your first question. What I found is both x and y are able to see all the columns and tag based policy does not seem to work. --> This is because if any one ranger policy satisfy/grants permissions to user x and y, they will be able to access both x and y data. Since you have created ranger policy in first place giving access to both x and y thats giving access for both x and y to access for all columns. Try removing ranger policy only y user will be able to access that column. Question2: What is the use of AD integration in Atlas? How AD users are used in Atlas? --> you can sync your AD users directly to access Atlas UI and to track data governance. Question3: What is hive hook and can some one provide more information on it. --> Atlas Hive hook is used by Hive to support listeners on hive command execution using hive hooks. This is used to add/update/remove entities in Atlas using the model defined in org.apache.atlas.hive.model.HiveDataModelGenerator. The hook submits the request to a thread pool executor to avoid blocking the command execution. The thread submits the entities as message to the notification server and atlas server reads these messages and registers the entities. Follow these instructions in your hive set-up to add hive hook for Atlas: Question4: How to create geo-based policy and time-based policy using Atlas? ---> As per I know currently we can only integrate your tag sync policies into Atlas.

vbonthu · ‎03-23-2017

awesome @Ken Jiiii hive-site.xml should be available across the cluster in /etc/spark/conf ( where /usr/hdp/current/spark-client/conf will be symlink to) and spark client need to be installed across the cluster worker nodes for your yarn-cluster mode to run as your spark driver can run on any worker node and should be having client installed with spark/conf. If you are using Ambari it will taking care of hive-site.xml available in /spark-client/conf/

vbonthu · ‎03-22-2017

Thanks @Ken Jiiii Looking at your error, application master failed 2 times due to exit code 15, Did you check your /spark/conf if you have placed hive-site.xml and in your code can you try removing " .setMaster("local[2]") " as you are running on yarn. try running it spark-submit --class com.test.spark.Test --master yarn-cluster hdfs://HDP25/test.jar

vbonthu · ‎03-21-2017

@Ken Jiiii You can follow this link where you have example and pom.xml. and answering to questions " I do not need any cluster or Hortonworks specific things in my pom, right? " Yes you dont need. All those values should be in you code or client configs ( core-site.xml, yarn-site.xml )

vbonthu · ‎03-11-2017

1. You can update the share lib with the following jars or can be directly passed in oozie workflow.xml. ( Make sure you use 3.2 version not 4.x datanucleus jars ) /usr/hdp/current/spark-client/lib/datanucleus-rdbms-3.2.9.jar /usr/hdp/current/spark-client/lib/datanucleus-core-3.2.10.jar /usr/hdp/current/spark-client/lib/datanucleus-api-jdo-3.2.6.jar To copy jars to sharelib #hdfs dfs –put /usr/hdp/current/spark-client/lib/datanucleus-rdbms-3.2.9.jar /user/oozie/share/lib/lib_*/spark/ If you copy the jars to sharelib make sure run oozie sharelibupdate Update oozie sharelib: # oozie admin -oozie http://<oozie-server>:11000/oozie -sharelibupdate Verify the current spark action sharelib with all the above files: # oozie admin -oozie http://<oozie-server>:11000/oozie -shareliblist spark* Make sure you have hive-site.xml in sharelib too and have the following properties in it. Replace the values with your hive-site.xml values. <configuration> <property> <name>hive.metastore.kerberos.keytab.file</name> <value>/etc/security/keytabs/hive.service.keytab</value> </property> <property> <name>hive.metastore.kerberos.principal</name> <value>hive/_HOST@SANDBOX.COM</value> </property> <property> <name>hive.metastore.sasl.enabled</name> <value>true</value> </property> <property> <name>hive.metastore.uris</name> <value>thrift://vb-atlas-node1.hortonworks.com:9083</value> </property> <property> <name>hive.server2.authentication</name> <value>KERBEROS</value> </property> <property> <name>hive.server2.authentication.kerberos.keytab</name> <value>/etc/security/keytabs/hive.service.keytab</value> </property> <property> <name>hive.server2.authentication.kerberos.principal</name> <value>hive/_HOST@SANDBOX.COM</value> </property> <property> <name>hive.server2.authentication.spnego.keytab</name> <value>/etc/security/keytabs/spnego.service.keytab</value> </property> <property> <name>hive.server2.authentication.spnego.principal</name> <value>HTTP/_HOST@SANDBOX.COM</value> </property> <property> <name>hive.metastore.warehouse.dir</name> <value>/apps/hive/warehouse</value> </property> <property> <name>hive.metastore.cache.pinobjtypes</name> <value>Table,Database,Type,FieldSchema,Order</value> </property> <property> <name>javax.jdo.option.ConnectionDriverName</name> <value>com.mysql.jdbc.Driver</value> </property> <property> <name>javax.jdo.option.ConnectionURL</name> <value>jdbc:mysql://vb-atlas-node1.hortonworks.com/hive?createDatabaseIfNotExist=true</value> </property> </configuration> 2. Create a workflow.xml, please make sure you replace the Metastore url and jar's location. <workflow-app name="spark-wf" xmlns="uri:oozie:workflow:0.5"> <credentials> <credential name='hcat_auth' type='hcat'> <property> <name>hcat.metastore.uri</name> <value>thrift://vb-atlas-node1.hortonworks.com:9083</value> </property> <property> <name>hcat.metastore.principal</name> <value>hive/_HOST@SANDBOX.COM</value> </property> </credential> </credentials> <start to="spark-action"/> <action name="spark-action" cred='hcat_auth'> <spark xmlns="uri:oozie:spark-action:0.1"> <job-tracker>${jobTracker}</job-tracker> <name-node>${nameNode}</name-node> <prepare> <delete path="${nameNode}/user/spark/sparkOozie/output-data/spark"/> </prepare> <master>${master}</master> <name>Spark Hive Example</name> <class>com.hortonworks.vinod.SparkSqlExample</class> <jar>${nameNode}/user/{User_You_run _as}/lib/Spark-Example-vinod-0.0.1-SNAPSHOT.jar</jar> <spark-opts>--driver-memory 512m --executor-memory 512m --num-executors 1 --jars /usr/hdp/current/spark-client/lib/datanucleus-rdbms-3.2.9.jar,/usr/hdp/current/spark-client/lib/datanucleus-core-3.2.10.jar,/usr/hdp/current/spark-client/lib/datanucleus-api-jdo-3.2.6.jar --files /usr/hdp/current/spark-client/conf/hive-site.xml</spark-opts> <arg>thrift://vb-atlas-node1.hortonworks.com:9083</arg> </spark> <ok to="end"/> <error to="kill"/> </action> <kill name="kill"> <message>Action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message> </kill> <end name="end"/> </workflow-app> 3. Upload the jars to run the program and the input file to home path as which you run the oozie job as. # hdfs dfs -put Spark-Example-vinod-0.0.1–SNAPSHOT.jar /user/{User_You_run _as}/lib/Spark-Example-vinod-0.0.1–SNAPSHOT.jar #hdfs dfs –put input.txt /user/{User_You_run _as}/ - Upload workflow.xml to HDFS: For example: # hdfs dfs -put workflow.xml /user/{User_You_run _as}/ 4. Config the job.properties and run the job. nameNode:hdfs://<namenode_HOST>:8020 jobTracker= <Resource_Manager.:8050 oozie.wf.application.path=/user/{User_You_run _as}/ oozie.use.system.libpath=true master=yarn-cluster 5. Run the oozie job with the properites: # oozie job -oozie http://<oozie-server>:11000/oozie/ -config job.properties -run You should be seeing Spark Hive Example in Resource Manager and output will be in std.out Log Type: stdout Log Upload Time: Fri Mar 10 22:30:16 +0000 2017 Log Length: 99 +---+-------+ | id| name| +---+-------+ | 1|sample1| | 2|sample2| | 3|sample3| +---+———+ 6 == com.hortonworks.vinod.SparkSqlExample.class === package com.hortonworks.vinod; import java.io.IOException; import javax.security.auth.login.Configuration; import org.apache.spark.SparkConf; import org.apache.spark.SparkContext; import org.apache.hadoop.fs.FileSystem; import org.apache.hadoop.fs.Path; import org.apache.spark.sql.DataFrame; import org.apache.spark.sql.hive.HiveContext; public class SparkSqlExample { public static void main(String[] args) throws IOException { org.apache.hadoop.conf.Configuration conf = new org.apache.hadoop.conf.Configuration(); conf.addResource("/etc/hadoop/conf/core-site.xml"); conf.addResource("/etc/hadoop/conf/hdfs-site.xml"); conf.addResource("/etc/hive/conf/hive-site.xml"); FileSystem fs = FileSystem.get(conf); SparkConf sparkConf = new SparkConf().setAppName("JavaSparkSQL"); SparkContext sss = new SparkContext(sparkConf); // JavaSparkContext ctx = new JavaSparkContext(sparkConf); HiveContext hivecontex = new HiveContext(sss); hivecontex.sql("create external table if not exists SparkHiveExample ( id int, name string) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS TextFile"); hivecontex.sql("LOAD DATA INPATH 'input.txt' OVERWRITE INTO TABLE SparkHiveExample"); DataFrame df = hivecontex.sql("select * from SparkHiveExample"); df.show(); } } 7 pom.xml <projectxmlns="http://maven.apache.org/POM/4.0.0"xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd"> <modelVersion>4.0.0</modelVersion> <groupId>com.hortonworks.sparkExample</groupId> <artifactId>Spark-Example-vinod</artifactId> <version>0.0.1-SNAPSHOT</version> <name>Spark Examples</name> <description>Spark programs </description> <parent> <groupId>org.apache.spark</groupId> <artifactId>spark-parent_2.10</artifactId> <version>1.6.2</version> </parent> <dependencies>  <dependency> <groupId>org.apache.spark</groupId> <artifactId>spark-core_2.10</artifactId> <version>1.6.0</version> </dependency>  <dependency> <groupId>org.apache.spark</groupId> <artifactId>spark-hive_2.10</artifactId> <version>1.6.1</version> </dependency>  <dependency> <groupId>org.scala-lang</groupId> <artifactId>scala-library</artifactId> <version>2.10.6</version> </dependency>  <dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-mapreduce-client-core</artifactId> <version>2.7.0</version> </dependency> </dependencies> </project>

vbonthu · ‎02-13-2017

As we know many services like Atlas for lineage, Ranger for audit logs, log search and so on uses Ambari Infra (Solr) for indexing data. So moving Ambari Infra in production and keeping it stable and up is really important. This are the key points I came up with to make this happen Hardware – Try to have minimum of 3 Ambari infra nodes with atleast 1-2TB disk for Solr data storage, but mainly depends on how many components ( like Ranger, Atlas , Log search.. ) and amount of data will feed into Solr of indexing. A major driving factor for Solr performance is RAM. Solr requires sufficient memory for two separate things: One is the Java heap, the other is free memory for the OS disk cache. Let's say that you have a Solr index size of 8GB. If your OS, Solr's Java heap, and all other running programs require 4GB of memory, then an ideal memory size for that server is at least 12GB. So How much memory do I need for Ambari Infra?? This is one of those questions that has no generic answer. You want a heap that's large enough so that you don't have OOM exceptions and problems with constant garbage collection, but small enough that you're not wasting memory or running into huge garbage collection pauses. So ideally we can start with 8GB total memory (leaving 4GB for disk cache)initially, but that also might NOT be enough. The really important thing is to ensure that there is a high cache hit ratio on the OS disk cache. GC - GC pauses usually caused by full garbage collections i.e pause all program execution to clean up memory. GC tuning is an art form, and what works for one person may not work for you. Using the ConcurrentMarkSweep (CMS) collector with tuning parameters is a very good option for for Solr, but with the latest Java 7 releases (7u72 at the time of this writing), G1 is looking like a better option, if the -XX:+ParallelRefProcEnabled option is used. Information from Oracle engineers who specialize in GC indicates that the latest Java 8 will noticeably improve G1 performance over Java 7, but that has not been confirmed. Here are some ideas that hopefully you will find helpful: The "MaxNewSize" should not be low, because the applications use caches setting it low value will cause the temporary cache data to me moved to Old Generation prematurely / so quickly. Once the objects are moved to Old gen then only during the complete GC face they will get cleared and till that time they will be present in the heap space. We should set the "MaxNewSize" (young generation heap size) to atleaset 1/6 (recommended) or 1/8 of the MaxHeap in genaral. If our application creates much more temporary objects (short lived) cached then the MaxNewSize can be further increased. Example : -Xmx8192m –Xms8192m –XX:MaxNewSize=1365m Because normally the throughput Collector starts a GC cycle only when the heap is full (or reaches max), In order to finish a GC cycle before the application runs out of memory (or max memory), the CMS Collector needs to start a GC cycle much earlier than the throughput collector by setting -XX:CMSInitiatingOccupancyFraction=65 -XX:+UseCMSInitiatingOccupancyOnly this will help in reducing the long GC pause. Because it will help the JVM to more proactively clean the heap when it reaches to 65% instead of waiting for it to be filled 90% and above. Zookeeper – As we know Solr uses Zookeeper to manage configs and co-ordination. Solr doesn’t use zookeeper that intensively when compared to other services(Kafka, services HA..). Since SolrCloud relies on Zookeeper, it can be very unstable if you have underlying performance issues that result in operations taking longer than the zkClientTimeout. Increasing that timeout can help, but addressing the underlying performance issues will yield better results. The default timeout 30 sec should be more than enough for a well-tuned SolrCloud. As we always strongly recommend storing the Zookeeper data on separate physical disks form other services and OS. Having dedicated machines when we have multiple services using ZK is even better, but not a requirement Availability - Having multiple shards with replication helps to keep the solr collections available in most of the cases like nodes going down. By default most of the collection are created with 1 shard and 1 replica. We can use the following commands to split the shard or recreate the collection with multiple shards. Example Ranger Audit log, we can split the existing shard or recreate the collection. If its a new install/initial stages I would delete and recreate the collection. To delete ranger_audits collection http://vb-atlas-ambari.hortonworks.com:8886/solr/admin/collections?action=delete&name=ranger_audits If you don’t have Solr UI enable or access you can use spnego principal and run the below command from command-line curl -i --negotiate -u : “http://vb-atlas-ambari.hortonworks.com:8886/solr/admin/collections?action=delete&name=ranger_audits" create new ranger_audits http://vb-atlas-ambari.hortonworks.com:8886/solr/admin/collections?action=create&name=ranger_audits& numShards=3&replicationFactor=2&collection.configName=ranger_audits Or from command line curl -i --negotiate -u : "http://vb-atlas-ambari.hortonworks.com:8886/solr/admin/collections?action=create&name=ranger_audits& numShards=3&replicationFactor=2&collection.configName=ranger_audits" You can also provide solr nodes where your shards can land in http://vb-atlas-ambari.hortonworks.com:8886/solr/admin/collections?action=create&name=ranger_audits& numShard=3&replicationFactor=2&collection.configName=ranger_audits&createNodeSet=xhadambum1p.hortonworks.com:8886/solr,xhadambum2p.hortonworks.com:8886/solr,xhadambum3p.hortonworks.com:8886/solr NOTE: Since we are using same collection.configName we don’t need to provide configs again for collection. Split Shard The below command split the shard1 into 2 shards shard1_0 and shard1_1 http://vb-atlas-ambari.hortonworks.com:8886/solr/admin/collections?collection=ranger_audit&shard=shard1&action=SPLITSHARD Disk Space Some time having high expiration for documents can fill up the disk space in case of heavy traffic. So configuring the right TTL can eliminate this kind of disk space alerts. Example by default ranger_audits have 90days ttl this can be changed if needed. If you haven't used Solr Audits before and haven't enabled Ranger Audits to Solr via Ambari yet, it will be easy to adjust the TTL configuration. By default ranger has its solrconfig.xml in /usr/hdp/2.5.0.0-1245/ranger-admin/contrib/solr_for_audit_setup/conf/solrconfig.xml So you can directly edit the solrconfig.xml file and change +90days to the other number. --> <updateRequestProcessorChain name="add-unknown-fields-to-the-schema"> <processor> <str name="fieldName">_ttl_</str> <str name="value">+60DAYS</str> </processor> <processor> <int name="autoDeletePeriodSeconds">86400</int> <str name="ttlFieldName">_ttl_</str> <str name="expirationFieldName">_expire_at_</str> </processor> <processor> <str name="fieldName">_expire_at_</str> </processor> Afterwards, you can go to Ambari and enable Ranger Solr Audits, the collection that is going to be created will use the new setting. If you already configured Ranger audits to Solr Go to one of the Ambari Infra nodes that hosts a Solr Instance. You can download the solrconfig.xml or change the existing one of the component you have To download /usr/lib/ambari-infra-solr/server/scripts/cloud-scripts/zkcli.sh -cmd getfile /infra-solr/configs/ranger_audits/solrconfig.xml solrconfig.xml -z vb-atlas-ambari.hortonworks.com:2181 Edit the downloaded solrconfig.xml and change the ttl Upload the config back to Zookeeper /usr/lib/ambari-infra-solr/server/scripts/cloud-scripts/zkcli.sh -cmd putfile /infra-solr/configs/ranger_audits/solrconfig.xml solrconfig.xml -z vb-atlas-ambari.hortonworks.com:2181 Reload the config http://vb-atlas-ambari.hortonworks.com:8886/solr/admin/collections?action=RELOAD&name=ranger_audits Or form command line curl -v --negotiate -u : "http://vb-atlas-ambari.hortonworks.com:8886/solr/admin/collections?action=RELOAD&name=ranger_audits” Example of doc after changing ttl from +90DAYS to +60DAYS you can verify curl -i --negotiate -u : "http://vb-atlas-ambari.hortonworks.com:8886/solr/ranger_audits_shard1_replica1/select?q=_ttl_%3A%22%2B60DAYS%22%0A&wt=json&indent=true" or from solr query UI have q as _ttl_:"+60DAYS" p.p1 {margin: 0.0px 0.0px 0.0px 0.0px; font: 13.0px Courier; -webkit-text-stroke: #000000} span.s1 {font-kerning: none} { "responseHeader":{ "status":0, "QTime":6, "params":{ "q":"_ttl_:\"+60DAYS\"\n", "indent":"true", "wt":"json"}}, "response":{"numFound":38848,"start":0,"docs":[ { "id":"004fa587-c531-429a-89a6-acf947d93c39-70574", "access":"WRITE", "enforcer":"hadoop-acl", "repo":"vinodatlas_hadoop", "reqUser":"spark", "resource":"/spark-history/.133f95bb-655f-450f-8aea-b87288ee2748", "cliIP":"172.26.92.153", "logType":"RangerAudit", "result":1, "policy":-1, "repoType":1, "resType":"path", "reason":"/spark-history", "action":"write", "evtTime":"2017-02-08T23:08:08.103Z", "seq_num":105380, "event_count":1, "event_dur_ms":0, "_ttl_":"+60DAYS", "_expire_at_":"2017-04-09T23:08:09.406Z", "_version_":1558808142185234432}, {

vbonthu · ‎01-01-2017

Setting up Log Search SSL and HTTPS A keystore and a truststore are required for this setup. These instructions assume that you have already created .jks for the keystore and truststore 1. Create keystore location a. Keystore Setup Place the keystore in /etc/security/certs/ and else by using a symlink you can point to another location of your keystore.jks b. Ensure log search user can read the keystore chown logsearch:hadoop *.keyStore.jks 2. Create a truststore for logsearch: a. Cert signed by CA: i. Copy the keystore into <host>.trustStore.jks ii. Create a symlink to this similar to the keystore /etc/security/certs/truststore.jks -> /etc/security/certs/<host>.trustStore.jks b. Ensure log search user can read the trust store chown logsearch:hadoop *.trustStore.jks 3. Update Ambari configuration a. Update logsearch UI Protocol to https b. Update Trust store location ( logsearch_truststore_location ) and password c. Update Keystore location ( logsearch_keystore_location ) and password 4. Restart log search server UPDATE Logsearch Alert in Ambari Once the Log Search is configured to be accessed using SSL, the following steps are to be performed to update the Alert Definition of "Log Search Web UI" to check https URL. Note: Please replace the variables with appropriate values for your cluster ( Admin credentials, Ambari host and Cluster name ) 1. GET Alert Definition ID. Execute the below command, by replacing the variables with appropriate values. and search for logsearch_ui section curl -s -k -u $AMB_USER:$AMB_PASS -H 'X-Requested-By: ambari' -X GET http://<Ambari_HOST>:8443/api/v1/clusters/<CLUSTER_NAME>/alert_definitions Sample output for logsearch_ui section: { "href" : "http://sandbox.hortonworks.com:8443/api/v1/clusters/sandbox/alert_definitions/451", "AlertDefinition" : { "cluster_name" : “sandbox", "id" : 451, "label" : "Log Search Web UI", "name" : "logsearch_ui" } }, 2. GET the Alert Definition. Use the href value from the above step's sample output to get the Alert Definition of "Log Search Web UI" by executing the below command. curl -s -k -u $AMB_USER:$AMB_PASS -H 'X-Requested-By: ambari' -X GET http://sandbox.hortonworks.com:8443/api/v1/clusters/sandbox/alert_definitions/451 Sample Output: { "href" : "http://sandbox.hortonworks.com.com:8443/api/v1/clusters/sandbox/alert_definitions/451", "AlertDefinition" : { "cluster_name" : “sandbox", "component_name" : "LOGSEARCH_SERVER", "description" : "This host-level alert is triggered if the Log Search UI is unreachable.", "enabled" : true, "help_url" : null, "id" : 451, "ignore_host" : false, "interval" : 1, "label" : "Log Search Web UI", "name" : "logsearch_ui", "repeat_tolerance" : 1, "repeat_tolerance_enabled" : false, "scope" : "ANY", "service_name" : "LOGSEARCH", "source" : { "reporting" : { "critical" : { "text" : "Connection failed to {1} ({3})" }, "ok" : { "text" : "HTTP {0} response in {2:.3f}s" }, "warning" : { "text" : "HTTP {0} response from {1} in {2:.3f}s ({3})" } }, "type" : "WEB", "uri" : { "http": "{{logsearch-env/logsearch_ui_port}}", "https": "{{logsearch-env/logsearch_ui_port}}", "default_port": 61888, "connection_timeout": 5 } } } } 3. Create a temp file with new variables. Create a temp file (in this example: logsearch_uri) with below contents to update the URI sections to include https_property and https_property_value variables and values. logsearch_uri file contents: { "AlertDefinition": { "source": { "reporting": { "ok": { "text": "HTTP {0} response in {2:.3f}s" }, "warning": { "text": "HTTP {0} response from {1} in {2:.3f}s ({3})" }, "critical": { "text": "Connection failed to {1} ({3})" } }, "type": "WEB", "uri": { "http": "{{logsearch-env/logsearch_ui_port}}", "https": "{{logsearch-env/logsearch_ui_port}}", "https_property": "{{logsearch-env/logsearch_ui_protocol}}", "https_property_value": "https", "default_port": 61888, "connection_timeout": 5 } } } } 4. PUT the updated Alert Definition. Execute the below command to update the Alert Definition using logsearch_uri file created in the previous step. There will be no output to displayed after the execution of this command. curl -s -k -u $AMB_USER:$AMB_PASS -H 'X-Requested-By: ambari' -X PUT -d @logsearch_uri http://sandbox.hortonworks.com:8443/api/v1/clusters/sandbox/alert_definitions/451 5. Validate the update Execute again the get Alert Definition command (as below) and verify the https_property and https_propert_value are now part of uri section. curl -s -k -u $AMB_USER:$AMB_PASS -H 'X-Requested-By: ambari' -X GET http://sandbox.hortonworks.com:8443/api/v1/clusters/sandbox/alert_definitions/451 Sample Output: { "href" : "http://sandbox.hortonworks.com:8443/api/v1/clusters/sandbox/alert_definitions/451", "AlertDefinition" : { "cluster_name" : “sandbox", "component_name" : "LOGSEARCH_SERVER", "description" : "This host-level alert is triggered if the Log Search UI is unreachable.", "enabled" : true, "help_url" : null, "id" : 451, "ignore_host" : false, "interval" : 1, "label" : "Log Search Web UI", "name" : "logsearch_ui", "repeat_tolerance" : 1, "repeat_tolerance_enabled" : false, "scope" : "ANY", "service_name" : "LOGSEARCH", "source" : { "reporting" : { "critical" : { "text" : "Connection failed to {1} ({3})" }, "ok" : { "text" : "HTTP {0} response in {2:.3f}s" }, "warning" : { "text" : "HTTP {0} response from {1} in {2:.3f}s ({3})" } }, "type" : "WEB", "uri" : { "http": "{{logsearch-env/logsearch_ui_port}}", "https": "{{logsearch-env/logsearch_ui_port}}", "https_property": "{{logsearch-env/logsearch_ui_protocol}}", "https_property_value": "https", "default_port": 61888, "connection_timeout": 5 } } } } NOTE: In the first if you had disabled Alert Definition for "Log Search Web UI" in Ambari, then Enable it again, else wait for the time interval for alert check to execute.

vbonthu · ‎12-30-2016

SUMMARY: How to enable Performance logging in Atlas, where we can track each search time like time taken to get any entities or to get linegae info. It helps while debugging if Atlas UI or API is taking long time to get results. We can check which phase is taking long time and debug accordingly. Example : 2016-12-20 14:24:02,344|qtp1381713434-59648 - ce3e660e-bdcb-4656-805d-7a99d0b9ddb6|PERF|EntityResource.getEntityDefinition()|452 2016-12-20 14:24:02,432|qtp1381713434-59901 - d15c9039-945a-4a87-abf2-017fdde22ad6|PERF|EntityResource.getEntityDefinition()|6 2016-12-20 14:24:02,553|qtp1381713434-59893 - d9624e31-6c1f-4900-8269-e9f14dfb0a09|PERF|EntityResource.getAuditEvents(03b90ea3-a307-4cfd-ba93-79a2a7cbadf8, null, 26)|117 2016-12-20 14:24:02,643|qtp1381713434-59896 - 775b3108-49e4-4c69-af65-b028a21b26b3|PERF|LineageResource.schema(03b90ea3-a307-4cfd-ba93-79a2a7cbadf8)|207 2016-12-20 14:24:03,176|qtp1381713434-59894 - 98047e2d-181b-4a41-bdf1-4d273a4cc7a3|PERF|LineageResource.inputsGraph(03b90ea3-a307-4cfd-ba93-79a2a7cbadf8)|750 2016-12-20 14:24:03,936|qtp1381713434-59857 - 1dff70bd-03d8-4f42-a294-440cd19e4d41|PERF|LineageResource.outputsGraph(03b90ea3-a307-4cfd-ba93-79a2a7cbadf8)|732 2016-12-20 14:26:48,452|NotificationHookConsumer thread-0|PERF|EntityResource.deleteEntities()|2184 STEPS: 1. Go to Ambari -> Atlas -> Config -> Advanced -> Atlas-log4j and add the following in atlas-log4j from Ambari <appender name="perf_appender"> <param name="file" value="${atlas.log.dir}/atlas_perf.log" /> <param name="datePattern" value="'.'yyyy-MM-dd" /> <param name="append" value="true" /> <layout> <param name="ConversionPattern" value="%d|%t|%m%n" /> </layout> </appender> <logger name="org.apache.atlas.perf" additivity="false"> <level value="debug" /> <appender-ref ref="perf_appender" /> </logger> 2. Save your config changes and do the required restarts (Restart Atlas) . 3. You should be seeing performance logging in /var/log/atlas/atlas_perf.log

vbonthu · ‎08-05-2016

@sankar rao Restarting the Ambari server as work around.

Online	Offline
Last Visited	‎10-17-2018 07:25 PM

Member Since	‎12-02-2015 06:43 PM
Last Visited	‎10-17-2018 07:25 PM
Posts	42
Kudos received	28

Cloudera Community

Re: When is NiFi v1.6 is planned to release as par...

Re: HBase ACL's and snapshots

Re: Minimal executable jar based on Scala code pac...

Re: HBase ACL's and snapshots

Re: Help! Understanding more about Atlas and Range...

Re: Minimal executable jar based on Scala code pac...

Re: Minimal executable jar based on Scala code pac...

Re: Minimal executable jar based on Scala code pac...

Oozie Spark Action to access Hive using HiveContex...

Ambari Infra best practices

How to Enable SSL for Logsearch and Update Logsera...

How to Enable Performance logging in Atlas

Re: corrupted block issue..i have 100+ corrupted b...