Community Articles

deepesh1 · ‎10-22-2015

Various open-source tools and applications are available to do performance, scalability and reliability testing for various popular HDP components. Here is a list of some of the popular ones:

HDFS

TestDFSIO

Measure the I/O performance of HDFS in your cluster. Source code for the tool can be found here.

NameNode Benchmark

Applies load on the namenode by performing continuous read, write, rename and delete operations on small files. Source code for this tool can be found here.

Synthetic Load Generator

The synthetic load generator (SLG) is a tool for testing NameNode behavior under different client loads. The user can generate different mixes of read, write, and list requests by specifying the probabilities of read and write. The user controls the intensity of the load by adjusting parameters for the number of worker threads and the delay between operations. More information on the tool can be found here.

YARN/MR

TeraSort

Measure performance by measuring time to sort 1TB of data. The test runs in three steps. First one is TeraGen to generate the dataset, second one is the TeraSort to sort the generated data and third one is TeraValidate to verify the sort order is correct. You can change to use a different data size.

MapReduce Benchmark

Runs a job multiple times and takes average of all runs. Source code for the tool can be found here.

GridMix

GridMix submits a mix of synthetic jobs, modeling a profile mined from production loads. More information on the tool can be found here.

HBase

YCSB

Performance evaluation of HBase under pre-defined workloads. More information can be found here.

HBase Performance Evaluation

Script used for evaluating HBase performance and scalability. Runs a HBase client that steps through one of a set of hardcoded tests or 'experiments' (e.g. a random reads test, a random writes test, etc.). More information can be found here.

LoadTest Tool

A command-line utility that reads, writes, and verifies data. Unlike PerformanceEvaluation, this tool validates the data written, and supports simultaneously writing and reading the same set of keys. Source for the tool can be found here.

ChaosMonkey

A utility to injects faults in a running cluster. More information can be found here.

Hive

TPC Benchmarks (TPCDS & TPCH)

TPCDS and TPCH are analytic benchmarks that model generally applicable aspects of decision support system. Automated scripts to run TPC benchmarks at scale including the converted queries can be found here.

Pig

PigMix

PigMix is a set of queries used to test pig performance. More information can be found here.

aervits · ‎04-27-2016

@Deepesh here's a Spark benchmark from @vshukla https://community.hortonworks.com/questions/29085/spark-benchmarking-tools.html

paparazi257 · ‎07-28-2017

Thank to your sharing. I think it is useful and I would like to share it to my website http://www.thuexetai.info

Report Inappropriate Content · ‎08-17-2017

it's really helpful, thanks

--------------------

https://moshimoshi.vn/

Cloudera Community

Community Articles

Measuring HDP Performance, Scale and Reliability

Apache Hadoop

Apache HBase

Apache Hive

Apache Pig

Apache YARN

HDFS

Hortonworks Data Platform (HDP)

MapReduce

Re: Measuring HDP Performance, Scale and Reliability

Re: Measuring HDP Performance, Scale and Reliability

Re: Measuring HDP Performance, Scale and Reliability

Faster Auto-scaling for Higher Computing Requireme...

Scaling the HDFS NameNode (part 4) - Avoiding Perf...

Tips and best practices for optimizing Hive perfor...

Scaling the HDFS NameNode (part 5)

measuring network latency between nodes

Scaling the HDFS NameNode (part 1)

Scaling the HDFS NameNode (part 2)

SQOOP Performance tuning

IBM Spectrum Scale 4.2.3 Certified with HDP 2.6 an...

Safely removing IBM Spectrum Scale service from HD...