Member since
03-16-2016
707
Posts
1753
Kudos Received
203
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
5181 | 09-21-2018 09:54 PM | |
6594 | 03-31-2018 03:59 AM | |
2001 | 03-31-2018 03:55 AM | |
2207 | 03-31-2018 03:31 AM | |
4908 | 03-27-2018 03:46 PM |
10-21-2016
09:11 PM
15 Kudos
Introduction How many times we've heard that a problem can't be solved if it isn't
completely understood? However, all of us, even
when having a lot of experience and technical skill, we still tend to jump to the
solution workspace by-passing key steps in the scientific approach. We just don’t have enough patience and, quite often, the business is on our back to deliver a quick fix. We just want to get it done! We found
ourselves like that squirrel in the Ice Age chasing for the acorn just using our
experience and minimal experiment, dreaming that the acorn will fall on
our lap. This article is meant to re-emphasize a repeatable approach that can be applied to tune Storm
topologies starting from understanding the problem,
forming a hypothesis, testing that hypothesis, collecting the data, analyzing
the data, drawing conclusions based on facts and less on empirical. Hopefully it helps to catch more acorns! Understand the Problem Understand the functional problem
resolved by the topology; quite often the best tuning is executing a step
differently in the workflow for the same end-result, maybe apply a filter first
and then execute the step, rather executing the step forever and paying the penalty every time. You can also
prioritize success or learn to lose in order to win. This is the time also to
get some sense of business SLA and forecasted growth. Document the SLA by topology. Try to understand the reasons behind the SLA and identify opportunities for trade-offs. Define SLA for Business Success This is the time also to get some sense of business SLA and forecasted growth. Document the SLA by topology. Try to understand the reasons behind the SLA and identify opportunities for trade-offs. Gather Data Gather
data using Storm UI and other tools specific to the environment. Storm UI is your
first to go-to tool for tuning, most of the time. Another
tool to use is Storm’s builtin Metrics-Collecting API. This is available with
0.9.x. This helps to built-in metrics in your topology. You may have tuned it,
deployed the production, everybody is happy and two days later the issue comes
back. Wouldn’t be nice to have some metrics embedded on your code, already
focused on the usual suspects, e.g. external calls to a web service or a SQL to
your lookup data store, specific customers or some other strong business entity
that due to data and the workflow can become the bottleneck? In a storm topology, you have
spouts, bolts, workers that can play a role in your topology challenges. Learn about
topologies status, uptime, number of workers, executors, tasks, high-level
stats across four time windows, spout statistics, bolt statistics,
visualization of the spouts and the bolts and how they are connected, the flow
of tuples between all of the streams, all configurations for the topology. Define Baseline Numbers Document statistics about how
the topology performs in production. Setup a test environment and
try that topology with different loads aligned to realistic business SLAs. Tuning Options vs. Bottleneck
Increase parallelism (use
rebalance command), analyze
the change in capacity using the same Storm UI – if no improvement, focus on
the bolt, that is the likely bottleneck; if you see backlog in
your upstream (Kafka most likely), focus on the spout. After trying with spouts
and bolts, change focus on workers’ parallelism. Basic principle is to scale on
a single worker with executors until you find increasing executors does not
work anymore. If adjusting spout and bolt parallelism failed to provide additional benefits, play with the number of workers to see if we are now bound by the JVM we were running on and needed to parallelize across JVMs. If you still don’t meet the
SLA by tuning the existent spouts, bolts, and workers, it’s time to start controlling
the rate that flows into topology.
Max spout pending allows you to set a maximum number of tuples
that can be unacked at any given time. If the number of possible unacked tuples is lower than the total
parallelism you’ve set for your topology, then it could be a bottleneck. The goal
is with one or many spouts to assure that the maximum possible unacked tuples is
greater than the maximum number of tuples we can process based on our
parallelization so we can feel safe saying that max spout pending isn’t causing
a bottleneck. Without Max spout pending, tuples will continue to flow into your topology whether or
not you can keep up with processing them. Max spout pending allows to control the
ingest rate. Max spout
pending lets us erect a dam in front of our topology, apply back pressure, and
avoid being overwhelmed. Despite the optional nature of max spout pending, you
always should set it. When attempting to increase
performance to meet an SLA, increase the rate of data ingest by either
increasing spout parallelism or increasing the max spout pending. Assuming a 4x
increase in the maximum number of active tuples allowed, we’d expect to see the
speed of messages leaving our queue increase (maybe not by a factor of four,
but it’d certainly increase). If that caused the capacity metric for any of the
bolts to return to one or near it, tune the bolts again and repeat with the
spout and bolt until hit the SLA. These methods can be applied over and over until meet the
SLAs. This effort is high and automating steps is desirable. Deal with External Systems It’s easy when interacting with external services (such as a SOA
service, database, or filesystem) to ratchet up the parallelism to a high
enough level in a topology that limits in that external service keep your
capacity from going higher. Before you start tuning parallelism in a topology
that interacts with the outside world, be positive you have good metrics on
that service. You could keep turning up the parallelism on bolt to the point
that it brings the external service to its knees, crippling it under a mass of
traffic that it’s not designed to handle. Latency Let’s talk about one of the greatest enemies of fast code:
latency. There’s latency accessing local memory and hard drive, and accessing
another system over the network. Different interactions have different levels
of latency, and understanding the latency in your system is one of the keys to
tuning your topology. You’ll usually get fairly consistent response times and
all of the sudden those response times will vary wildly because of any number
of factors:
The
external service is having a garbage collection event.
A
network switch somewhere is momentarily overloaded.
Your
coworker wrote a runaway query that’s currently hogging most of the
database’s CPU. Something about
the data that’s likely to cause the delay. As the topology interacts with external services, let’s be smarter about our latency. We don't always need to increase parallelism and allocate more resources. We may discover after investigation that your variance is based
on customer or induced by an exceptional event, e.g. elections, Olympics, etc. Certain records
are going to be slower to look up. Sometimes one of those
“fast” lookups might end up taking longer. One
strategy that has worked well with services that exhibit this problem is to
perform initial lookup attempts with a hard ceiling on timeouts, try various
ceilings, if that fails, send it to a less parallelized instance of the same
bolt that will take longer with its timeout. The end result is that the time to
process a large number of messages goes down. Your mileage
will vary with this strategy, so test before you use it. As part of your tuning
strategy, you could end-up to break a bolt into two: one for fast lookups and
the other for slow lookups. At least, what goes fast let it go fast, don’t back
it up with a slow one. Treat the slow ones differently and make sure they are
the exception. You may even decide to address those differently and still get
some value of that data. It is up-to you to determine what is more important:
having an overall bottleneck or mark for later processing or disregard those producing a
bottleneck. Maybe a batch approach for those is more efficient. This is a
design consideration and a reasonable trade-off most of the time. Are you willing
to accept the occasional loss of fidelity but still hit the SLA because that is
more important for your business? Is Perfection helping or killing your business? Summary
All basic timing information for a topology can be found in the
Storm UI. Metrics (both built-in and custom) are essential if you want to
have a true understanding of how your topology is operating. Establishing a baseline
set of performance numbers for your topology is the essential first step
in the tuning process. Bottlenecks are
indicated by a high capacity for a spout/bolt and can be addressed by
increasing parallelism. Increasing parallelism
is best done in small increments so that you can gain a better
understanding of the effects of each increase. Latency that is both
related to the data (intrinsic) and not related to the data (extrinsic)
can reduce your topology’s throughput and may need to be addressed. Be ready for trade-offs
References Storm Applied: Strategies for real-time event processing by Sean T. Allen, Matthew
Jankowski, and Peter Pathirana
... View more
Labels:
10-21-2016
12:06 AM
13 Kudos
Sizing There's no simple rule of thumb for this, it's as much an art as it is a science, as it depends on the workloads and how chatty they are with your current ZKs. One way to look at this is: If you have 3 ZKs you can
afford to lose one, if you have 5 you can afford to lose two. If your IT is aggressively
applying security patches and other upgrades, like firmware, kernel, Java,
other packages used by Hadoop tools, and taking nodes down to do the job, then
during those upgrades with 3 ZKs, you ZK runs with only two nodes, and if you
are unlucky and one of them goes down, then your whole cluster will go down.
So, in this case 5 are better. Warning: the more ZK nodes you have, the slower
the ZK becomes for writes. Placement Zookeeper is a master node, as such it can be collocated with
other master services. Ideally, you would not want to collocate it with an HA service. It is quite light on memory and CPU requirements, but since is disk intensive, don't collocate it with disk-intensive services like Kafka or HDFS. Storage Requirements In general, Zookeeper doesn't actually require huge drives because it will only store metadata information for many services, It is common to use 100G to 250G for zookeeper data directory and logs which is fine of many cluster deployments. Moreover, it is recommended to set configuration for automatic purging policy of snapshots and logs directories so that it doesn't end up by filling all the local storage. Dedicated or Shared? At Yahoo!, ZooKeeper is usually deployed on
DEDICATED RHEL boxes, with dual-core processors, 2GB of RAM, and 80GB IDE hard
drives. For your
Kafka/Storm cluster, you could consider deploying ZK on DEDICATED physical
hardware (not virtual). The driving force for physical hardware or at
least for the dedicated disk is the transaction log and the high throughput
nature of Kafka and Storm. Since Kafka is usually used with Storm, have a separate
Zookeeper cluster for Kafka and Storm. Kafka and Storm are sharing then, please make
sure that you don’t put the Zookeeper cluster on the Kafka nodes. Put
the Zookeeper on the Storm nodes. Caution
Rather than going to larger clusters of ZKs, it is better to split out certain services to their own ZKs when they're putting more pressure on an otherwise fairly quiet ZK cluster. It is a good thing to have separate set of ZK for each cluster, one quorum for Kafka, one quorum for Storm, one quorum for the rest (YARN, HBase, Hive, HDFS), possibly separate zookeepers for HBase. Challenge is that more hardware is needed and more administration, but it pays off. Be careful where you put that transaction log. The most performance-critical part of ZooKeeper is the transaction log. ZooKeeper must sync transactions to media before it returns a response. A dedicated transaction log device is key to consistent good performance. Putting the log on a busy device will adversely impact performance. If you only have one storage device, put trace files on NFS and increase the snapshotCount; it doesn't eliminate the problem, but it can mitigate it. ZooKeeper's transaction log must be on a dedicated device. A dedicated partition is not enough. ZooKeeper writes the log sequentially, without seeking. Sharing your log device with other processes can cause seeks and contention, which in turn can cause multi-second delays. Do not put ZooKeeper in a situation that can cause a swap. In order for ZooKeeper to function with any sort of timeliness, it simply cannot be allowed to swap. Remember, in ZooKeeper, everything is ordered, so if one request hits the disk, all other queued requests hit the disk. Going to disk unnecessarily will almost certainly degrade your performance unacceptably. Therefore, make certain that the maximum heap size given to ZooKeeper is not bigger than the amount of real memory available to ZooKeeper. Set your Java max heap size correctly. To avoid swapping, try to set the heapsize to the amount of physical memory you have, minus the amount needed by the OS and cache. The best way to determine an optimal heap size for your configurations is to run load tests. If for some reason you can't, be conservative in your estimates and choose a number well below the limit that would cause your machine to swap. For example, on a 4G machine, a 3G heap is a conservative estimate to start with. Best Practices
The ZooKeeper data directory contains the snapshot and transactional log files. It is a good practice to periodically clean up the directory if the auto-purge option is not enabled. Also, an administrator might want to keep a backup of these files, depending on the application needs. However, since ZooKeeper is a replicated service, we need to back up the data of only one of the servers in the ensemble. ZooKeeper uses Apache log4j as its logging infrastructure. As the logfiles grow bigger in size, it is recommended that you set the auto-rollover of the logfiles using the in-built log4j feature for ZooKeeper logs. The list of ZooKeeper servers used by the clients in their connection strings must match the list of ZooKeeper servers that each ZooKeeper server has. Strange behaviors might occur if the lists don't match. The server lists in each Zookeeper server configuration file should be consistent with the other members of the ensemble. The ZooKeeper transaction log must be configured in a dedicated device. This is very important to achieve best performance from ZooKeeper. The Java heap size should be chosen with care. Swapping should never be allowed to happen in the ZooKeeper server. It is better if the ZooKeeper servers have a reasonably high memory (RAM). System monitoring tools such as vmstat can be used to monitor virtual memory statistics and decide on the optimal size of memory needed, depending on the need of the application. In any case, swapping should be avoided. References: https://community.hortonworks.com/questions/2498/best-practices-for-zookeeper-placement.html https://community.hortonworks.com/questions/53025/zookeeper-performance-and-metrics-when-to-resize.html https://community.hortonworks.com/questions/55868/zookeeper-on-even-master-nodes.html Apache ZooKeeper Essentials by Saurav Haloi Published by Packt Publishing, 2015
... View more
10-21-2016
03:50 AM
4 Kudos
@Hajime Basically, Ambari Infra is only a wrapper, a service like HDFS or YARN, which deploys/manages different components. At the moment, the only component is Solr, which is open source. As Constantin pointed out, the Ambari Infra Stack can be found here: https://github.com/apache/ambari/tree/2ad42074f1633c5c6f56cf979bdaa49440457566/ambari-server/src/main/resources/common-services/AMBARI_INFRA/0.1.0
... View more
04-05-2017
03:12 PM
@ Constantin Stanca I thought the proper way to do the maintenance on the data node is to decommission it, so it can do the following tasks:
Data Node - safely replicates the
HDFS data to other DNs Node Manager - stop accepting new job
requests Region Server - turns on drain mode In a urgent situation, I could agree on your suggestion. However, please advise me the right approach in a scenario where you have luxury to choose the maintenance window.
... View more
10-11-2016
01:23 AM
4 Kudos
@Smart Solutions The two main options for replicating the HDFS structure are Falcon and distcp. The distcp command is not very feature rich, you give it a path in the HDFS structure and a destination cluster and it will copy everything to the same path on the destination. If the copy fails, you will need to start it again, etc. Another method for maintaining a replica of your HDFS structure is Falcon. There are more data movement options and you can more effectively manage the lifecycle of all of the data on both sides. If you're moving Hive table structures, there is some more complexity to making sure the tables are created on the DR side, but moving the actual files is done the same way You excluded distcp as an option. As such, I suggest to look at Falcon. Check this: http://hortonworks.com/hadoop-tutorial/mirroring-datasets-between-hadoop-clusters-with-apache-falcon/ +++++++ if any response addressed your question, please vote and accept best answer.
... View more
10-06-2017
06:21 PM
@uday kv See this new article: https://community.hortonworks.com/articles/141035/jmeter-kerberos-setup-for-hive-load-testing.html
... View more
10-13-2016
01:25 AM
@ed day please consider revising your posts to remove your server names, this is a public forum and you're compromising your security.
... View more
09-30-2016
05:28 PM
5 Kudos
@Vijaya Narayana Reddy Bhoomi Reddy Edge nodes, while they may be in the same subnet with your HDP clusters, they are really not part of the actual clusters and as such there is no HDP configuration trick to redirect via edge nodes and Private Link. If you wish to use the 10 GB Private Link, it is just a matter of working with your network team to have those HDP clusters communicate via that Private Link instead of the firewall channeled network (doubt that they will want to do it). You did not put a number next to that "Firewall" line, but I assume that is much smaller since you want to use the other one. Maybe the network team needs to upgrade the firewall channeled network to meet the SLA. That is the correct approach and not use some trick to use the Private Link between edge nodes. It would meet your SLA and will also make network team happy to keep the firewall function in place. Network team may be able to peer-up those clusters to redirect the traffic through the private link without going through the edge nodes and by-passing the firewall channeled network, but I am pretty that they will break their network design principles going that way. The best approach is to upgrade the firewall channeled network to meet your needs.
... View more
10-02-2016
03:18 AM
3 Kudos
@Eric Periard The one you really need (1.1.0) is here: https://mvnrepository.com/artifact/org.apache.hadoop/hadoop-examples/1.1.0 Pls don't forget to vote and accept the response that helped.
... View more