About ambud_sharma1

ambud_sharma1 · ‎01-02-2018

For certain large environments, it's very easy to for Spark History Server to get overwhelmed by the large number of applications being posted and number of users / developers viewing history data. Spark jobs create an artifact called the history file which is what is parsed by the Spark History Server (SHS) and served via the UI. The size of this file has a huge impact in driving the load on the SHS also note that the size of history file is determined by the number of events generated by the SHS (small executor heart beat interval) Workaround: If you are still interested in analyzing performance issues with these large history files, one option is to download these files and browse them from a locally hosted SHS instance. To run this: Download Spark 1.6 https://spark.apache.org/downloads.html Unpack Create a directory to hold the logs called spark-logs Create a properties file called test.properties Inside test.properties add spark.history.fs.logDirectory=<path to the spark-logs directory> <spark download>/sbin/start-history-server.sh --properties-file <path to test.properties> Open web browser and visit http://localhost:18080 Once done, you can now download Spark History files from HDFS and copy them to this directory. The running Spark History Server will dynamically load the files as they are made available in spark-logs directory.

ambud_sharma1 · ‎11-01-2017

Abstract: Nimbus metrics are critical to operations as well as development teams for monitoring the performance and stability of Storm applications / topology. Usually most production environments have a metrics / operations monitoring systems including solr, elasticsearch, tsdbs etc. This post shows you; how you can use Collectd to forward these metrics over to your desired metrics environment and alert on them. Solution: Collectd is a standard metrics collection tool that can be run natively on linux operating systems. It's capable of capturing a wide variety of metrics, you can find more information on Collectd here: https://collectd.org/ So to capture Storm nimbus metrics, here's a collectd plugin that needs to be complied and built: https://github.com/srotya/storm-collectd (using Maven). Simply run: mvn clean package assembly:single In addition, you will need to install collectd and ensure that it has Java plugin capability. Here's a great post on how to do that: http://blog.asquareb.com/blog/2014/06/09/enabling-java-plugin-for-collectd/ (Please note that the JAR="/path/to/jar" JAVAC="/path/to/javac" variables need to be fixed before you can run it) Once installed, you will need to configure collectd using the following: (DON'T FORGET TO CONFIGURE OUTPUT PLUGIN) LoadPlugin java <Plugin "java"> # required JVM argument is the classpath # JVMArg "-Djava.class.path=/installpath/collectd/share/collectd/java" # Since version 4.8.4 (commit c983405) the API and GenericJMX plugin are # provided as .jar files. JVMARG "-Djava.class.path=<ABSOLUTE PATH>/lib/collectd-api.jar:<ABSOLUTE PATH>/target/storm-collectd-0.0.1-SNAPSHOT-jar-with-dependencies.jar" LoadPlugin "com.srotya.collectd.storm.StormNimbusMetrics" <Plugin "storm"> address "http://localhost:8084/" kerberos false jaas "<PATH TO JAAS CONF>" </Plugin> </Plugin>

ambud_sharma1 · ‎10-31-2017

Problem: If you have an AD/LDAP environment and using HDP with Ranger, it's critical to review the case in which usernames and group ids are stored in your Directory Services environment. Ranger authorization is case sensitive therefore if the username / group id doesn't match the one returned from Directory (AD/LDAP) authorization will be denied Solution: To solve this problem Ranger offers 2 parameters that can be set via Ambari. This should ideally be done at install time to avoid the need to re-sync all users. Ranger usersync properties for case conversion are: ranger.usersync.ldap.username.caseconversion ranger.usersync.ldap.groupname.caseconversion You can set these properties to lower or upper; this will make sure that Ranger will store the usernames and groups in the above specified format in it's local database therefore when users login their authorization parameter will match correctly.

dlaxmi1234 · ‎03-29-2017

@Ambud Sharma we are testing this change and will accept once we are done. I am still not 100% convinced that this solves the problem since the Storm documentation says BasicBolt does the acking and anchoring http://storm.apache.org/releases/1.0.1/Guaranteeing-message-processing.html Search for BasicBolt in that link and you will find "Storm has an interface called BasicBolt that encapsulates this pattern for you."

LesterMartin · ‎01-23-2017

Good write-up from @Ambud Sharma plus you can visit http://storm.apache.org/releases/1.0.2/Guaranteeing-message-processing.html for info from the source. Additionally, take a peek at the picture below I just exported from our http://hortonworks.com/training/class/hdp-developer-storm-and-trident-fundamentals/ course that might help visualize all of this information. Good luck and happy Storming!

yuhao_zhang · ‎10-22-2018

Hi @Ambud Sharma, I am new to HCP and Storm, I am running though the squid use case. From Kafka console consumer, my kafka is receiving data, and from metron UI, my squid sensor is running but throughput is 0kb/s, from Storm UI squid topology is active and 1 worker 5 executors, but data not coming in to the topology. The logs in storm/workers-artifacts for worker.log.err is empty, for worker.log it stopped below image.

cstanca · ‎11-24-2016

@ambud.sharma Voted up :). Before, it was counter-intuitive.

ambud_sharma1 · ‎11-14-2016

So the repo needs to be a shared (like NFS) storage between Nifi nodes?

Online	Offline
Last Visited	‎05-16-2018 05:32 PM

Member Since	‎11-07-2016 06:17 PM
Last Visited	‎05-16-2018 05:32 PM
Posts	58
Kudos received	25

Cloudera Community

Re: Unable to run/execute PrintSampleStreamTopolog...

Re: Storm - missing messages in pipeline

Re: Error while building the storm toplogy

Re: How does storm handle failed tuples?

Re: Where is Kafka have to be installed ?

Spark History File Offline Analysis

Collecting Storm Metrics with Collectd

Ranger User Sync Issues due to Case Difference

Re: Storm - missing messages in pipeline

Re: How does storm handle failed tuples?

Re: Storm Topology Runbook

Re: Guidelines for building Streaming Applications

Re: Details on Nifi Fault Tolerance