Created 07-11-2024 05:22 AM
How to set up TeraGen & TeraSort performance testing in a CDP private cloud cluster ?
Created 07-17-2024 09:18 AM
During the first iteration of the TeraGen job, the goal is to obtain a performance baseline on the disk I/O subsystem. The HDFS replication factor should be overridden from the default value 3 and set to 1 so that the data generated by the TeraGen job is not replicated to additional data nodes. Replicating the data over the network obscures the raw disk performance with potential network bandwidth constraints.
Once the first TeraGen job has been run, a second iteration should be run with the HDFS replication factor set to the default value. This applies a high load on the network, and deltas between the first run and second run can provide an indication of network bottlenecks in the cluster.
Please check below documentation for set of commands and details -
Created 07-17-2024 09:18 AM
During the first iteration of the TeraGen job, the goal is to obtain a performance baseline on the disk I/O subsystem. The HDFS replication factor should be overridden from the default value 3 and set to 1 so that the data generated by the TeraGen job is not replicated to additional data nodes. Replicating the data over the network obscures the raw disk performance with potential network bandwidth constraints.
Once the first TeraGen job has been run, a second iteration should be run with the HDFS replication factor set to the default value. This applies a high load on the network, and deltas between the first run and second run can provide an indication of network bottlenecks in the cluster.
Please check below documentation for set of commands and details -
Created 07-22-2024 01:37 PM
@Abdelghani Has the reply helped resolve your issue? If so, please mark the appropriate reply as the solution, as it will make it easier for others to find the answer in the future. Thanks.
Regards,
Diana Torres,