About ambud_sharma1

ambud_sharma1 · ‎01-02-2018

For certain large environments, it's very easy to for Spark History Server to get overwhelmed by the large number of applications being posted and number of users / developers viewing history data. Spark jobs create an artifact called the history file which is what is parsed by the Spark History Server (SHS) and served via the UI. The size of this file has a huge impact in driving the load on the SHS also note that the size of history file is determined by the number of events generated by the SHS (small executor heart beat interval) Workaround: If you are still interested in analyzing performance issues with these large history files, one option is to download these files and browse them from a locally hosted SHS instance. To run this: Download Spark 1.6 https://spark.apache.org/downloads.html Unpack Create a directory to hold the logs called spark-logs Create a properties file called test.properties Inside test.properties add spark.history.fs.logDirectory=<path to the spark-logs directory> <spark download>/sbin/start-history-server.sh --properties-file <path to test.properties> Open web browser and visit http://localhost:18080 Once done, you can now download Spark History files from HDFS and copy them to this directory. The running Spark History Server will dynamically load the files as they are made available in spark-logs directory.

ambud_sharma1 · ‎11-01-2017

Abstract: Nimbus metrics are critical to operations as well as development teams for monitoring the performance and stability of Storm applications / topology. Usually most production environments have a metrics / operations monitoring systems including solr, elasticsearch, tsdbs etc. This post shows you; how you can use Collectd to forward these metrics over to your desired metrics environment and alert on them. Solution: Collectd is a standard metrics collection tool that can be run natively on linux operating systems. It's capable of capturing a wide variety of metrics, you can find more information on Collectd here: https://collectd.org/ So to capture Storm nimbus metrics, here's a collectd plugin that needs to be complied and built: https://github.com/srotya/storm-collectd (using Maven). Simply run: mvn clean package assembly:single In addition, you will need to install collectd and ensure that it has Java plugin capability. Here's a great post on how to do that: http://blog.asquareb.com/blog/2014/06/09/enabling-java-plugin-for-collectd/ (Please note that the JAR="/path/to/jar" JAVAC="/path/to/javac" variables need to be fixed before you can run it) Once installed, you will need to configure collectd using the following: (DON'T FORGET TO CONFIGURE OUTPUT PLUGIN) LoadPlugin java <Plugin "java"> # required JVM argument is the classpath # JVMArg "-Djava.class.path=/installpath/collectd/share/collectd/java" # Since version 4.8.4 (commit c983405) the API and GenericJMX plugin are # provided as .jar files. JVMARG "-Djava.class.path=<ABSOLUTE PATH>/lib/collectd-api.jar:<ABSOLUTE PATH>/target/storm-collectd-0.0.1-SNAPSHOT-jar-with-dependencies.jar" LoadPlugin "com.srotya.collectd.storm.StormNimbusMetrics" <Plugin "storm"> address "http://localhost:8084/" kerberos false jaas "<PATH TO JAAS CONF>" </Plugin> </Plugin>

ambud_sharma1 · ‎10-31-2017

Problem: If you have an AD/LDAP environment and using HDP with Ranger, it's critical to review the case in which usernames and group ids are stored in your Directory Services environment. Ranger authorization is case sensitive therefore if the username / group id doesn't match the one returned from Directory (AD/LDAP) authorization will be denied Solution: To solve this problem Ranger offers 2 parameters that can be set via Ambari. This should ideally be done at install time to avoid the need to re-sync all users. Ranger usersync properties for case conversion are: ranger.usersync.ldap.username.caseconversion ranger.usersync.ldap.groupname.caseconversion You can set these properties to lower or upper; this will make sure that Ranger will store the usernames and groups in the above specified format in it's local database therefore when users login their authorization parameter will match correctly.

ambud_sharma1 · ‎05-17-2017

@Naveen Keshava why are you trying to run it using Maven? The issue is the code requires command line however maven is treating those arguments as goals. Please compile and run independently, even if you are trying to run it locally.

ambud_sharma1 · ‎03-28-2017

@Shravanthi please accept the answer if this solved your issue.

ambud_sharma1 · ‎03-22-2017

Here's some example code to show you how explicit anchoring and acking can be done: https://github.com/Symantec/hendrix/blob/current/hendrix-storm/src/main/java/io/symcpe/hendrix/storm/bolts/ErrorBolt.java

ambud_sharma1 · ‎03-22-2017

Yes, that is incorrect, https://github.com/apache/storm/blob/master/storm-core/src/jvm/org/apache/storm/topology/base/BaseBasicBolt.java this bolt class doesn't even have a collector to acknowledge messages.

ambud_sharma1 · ‎03-22-2017

@Laxmi Chary You should be anchoring, without anchoring Storm doesn't guarantee at least once semantics which means it's best effort. Anchoring is a factor of your delivery semantics, you should be using BaseRichBolt, otherwise you don't have a collector.

ambud_sharma1 · ‎03-17-2017

@Laxmi Chary thanks for your question. Do you know if there's ever a case where Message from Bolt 2 doesn't get written but from Bolt 3 does get written? Are you anchoring tuples in your topology? collector.emit(tuple, new Field()) [the tuple is the anchor] Are you doing any microbatching in your topology?

ambud_sharma1 · ‎02-10-2017

Please make sure you have a Topic created in Kafka: https://kafka.apache.org/documentation/#quickstart_createtopic

Online	Offline
Last Visited	‎05-16-2018 05:32 PM

Member Since	‎11-07-2016 06:17 PM
Last Visited	‎05-16-2018 05:32 PM
Posts	58
Kudos received	25

Cloudera Community

Re: Unable to run/execute PrintSampleStreamTopolog...

Re: Storm - missing messages in pipeline

Re: Error while building the storm toplogy

Re: How does storm handle failed tuples?

Re: Where is Kafka have to be installed ?

Spark History File Offline Analysis

Collecting Storm Metrics with Collectd

Ranger User Sync Issues due to Case Difference

Re: Unable to run/execute PrintSampleStreamTopolog...

Re: Storm - missing messages in pipeline

Re: Storm - missing messages in pipeline

Re: Storm - missing messages in pipeline

Re: Storm - missing messages in pipeline

Re: Storm - missing messages in pipeline

Re: Kafka spout cannot read from kafka topic