Member since
11-07-2016
58
Posts
26
Kudos Received
6
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1588 | 05-17-2017 04:57 PM | |
4680 | 03-17-2017 06:51 PM | |
2118 | 01-14-2017 07:03 PM | |
3303 | 01-14-2017 06:59 PM | |
1776 | 12-29-2016 06:45 PM |
01-02-2018
09:27 PM
1 Kudo
For certain large environments, it's very easy to for Spark History Server to get overwhelmed by the large number of applications being posted and number of users / developers viewing history data. Spark jobs create an artifact called the history file which is what is parsed by the Spark History Server (SHS) and served via the UI. The size of this file has a huge impact in driving the load on the SHS also note that the size of history file is determined by the number of events generated by the SHS (small executor heart beat interval) Workaround: If you are still interested in analyzing performance issues with these large history files, one option is to download these files and browse them from a locally hosted SHS instance. To run this: Download Spark 1.6 https://spark.apache.org/downloads.html Unpack Create a directory to hold the logs called spark-logs Create a properties file called test.properties Inside test.properties add spark.history.fs.logDirectory=<path to the spark-logs directory> <spark download>/sbin/start-history-server.sh --properties-file <path to test.properties> Open web browser and visit http://localhost:18080 Once done, you can now download Spark History files from HDFS and copy them to this directory. The running Spark History Server will dynamically load the files as they are made available in spark-logs directory.
... View more
Labels:
11-01-2017
12:15 AM
Abstract:
Nimbus metrics are critical to operations as well as development teams for monitoring the performance and stability of Storm applications / topology. Usually most production environments have a metrics / operations monitoring systems including solr, elasticsearch, tsdbs etc. This post shows you; how you can use Collectd to forward these metrics over to your desired metrics environment and alert on them.
Solution:
Collectd is a standard metrics collection tool that can be run natively on linux operating systems. It's capable of capturing a wide variety of metrics, you can find more information on Collectd here: https://collectd.org/
So to capture Storm nimbus metrics, here's a collectd plugin that needs to be complied and built: https://github.com/srotya/storm-collectd (using Maven). Simply run:
mvn clean package assembly:single
In addition, you will need to install collectd and ensure that it has Java plugin capability. Here's a great post on how to do that:
http://blog.asquareb.com/blog/2014/06/09/enabling-java-plugin-for-collectd/ (Please note that the JAR="/path/to/jar" JAVAC="/path/to/javac" variables need to be fixed before you can run it)
Once installed, you will need to configure collectd using the following: (DON'T FORGET TO CONFIGURE OUTPUT PLUGIN)
LoadPlugin java
<Plugin "java">
# required JVM argument is the classpath
# JVMArg "-Djava.class.path=/installpath/collectd/share/collectd/java"
# Since version 4.8.4 (commit c983405) the API and GenericJMX plugin are
# provided as .jar files.
JVMARG "-Djava.class.path=<ABSOLUTE PATH>/lib/collectd-api.jar:<ABSOLUTE PATH>/target/storm-collectd-0.0.1-SNAPSHOT-jar-with-dependencies.jar"
LoadPlugin "com.srotya.collectd.storm.StormNimbusMetrics"
<Plugin "storm">
address "http://localhost:8084/"
kerberos false
jaas "<PATH TO JAAS CONF>"
</Plugin>
</Plugin>
... View more
Labels:
10-31-2017
11:59 PM
2 Kudos
Problem: If you have an AD/LDAP environment and using HDP with Ranger, it's critical to review the case in which usernames and group ids are stored in your Directory Services environment. Ranger authorization is case sensitive therefore if the username / group id doesn't match the one returned from Directory (AD/LDAP) authorization will be denied Solution: To solve this problem Ranger offers 2 parameters that can be set via Ambari. This should ideally be done at install time to avoid the need to re-sync all users. Ranger usersync properties for case conversion are:
ranger.usersync.ldap.username.caseconversion ranger.usersync.ldap.groupname.caseconversion You can set these properties to lower or upper; this will make sure that Ranger will store the usernames and groups in the above specified format in it's local database therefore when users login their authorization parameter will match correctly.
... View more
Labels:
05-17-2017
04:57 PM
@Naveen Keshava why are you trying to run it using Maven? The issue is the code requires command line however maven is treating those arguments as goals. Please compile and run independently, even if you are trying to run it locally.
... View more
03-22-2017
07:12 PM
1 Kudo
Here's some example code to show you how explicit anchoring and acking can be done:
https://github.com/Symantec/hendrix/blob/current/hendrix-storm/src/main/java/io/symcpe/hendrix/storm/bolts/ErrorBolt.java
... View more
03-22-2017
07:06 PM
Yes, that is incorrect, https://github.com/apache/storm/blob/master/storm-core/src/jvm/org/apache/storm/topology/base/BaseBasicBolt.java this bolt class doesn't even have a collector to acknowledge messages.
... View more
03-22-2017
06:33 PM
@Laxmi Chary You should be anchoring, without anchoring Storm doesn't guarantee at least once semantics which means it's best effort. Anchoring is a factor of your delivery semantics, you should be using BaseRichBolt, otherwise you don't have a collector.
... View more
03-17-2017
06:51 PM
1 Kudo
@Laxmi Chary thanks for your question.
Do you know if there's ever a case where Message from Bolt 2 doesn't get written but from Bolt 3 does get written?
Are you anchoring tuples in your topology? collector.emit(tuple, new Field()) [the tuple is the anchor] Are you doing any microbatching in your topology?
... View more
02-10-2017
06:36 PM
Please make sure you have a Topic created in Kafka: https://kafka.apache.org/documentation/#quickstart_createtopic
... View more