About aishwaryamdixit

helmi_khalifa · ‎09-24-2019

HI @hadoopguy Yes there is an impact you will have longer processing time and the operations will be queued. You have to carefully handle the timeout in your jobs. Best, @helmi_khalifa

aishwaryamdixit · ‎06-07-2019

Hi @Jay Kumar SenSharma, Yes, we have installed Ambari and Ranger on the same node. And we are using HDP 2.6.5 in our cluster. Now I have a clear picture on why we are getting this error. Thank you so much for answering and your detailed explanation.

pminovic · ‎09-02-2017

Hi @Aishwarya Dixit, pe or "Performance Evaluation" is a tool based on MapReduce to test read/writes to HBase. nclients means that 10*nclients mappers will be started to run the supplied pe command. Example: hbase pe randomWrite 2 ... 2017-09-02 01:31:17,681 INFO [main] mapreduce.JobSubmitter: number of splits:20 starts a MR job with 20 mappers and 1 reducer. So you can start with a small number like 1-3 to make sure HBase works as expected, and then increase it to about max number of mappers you can run on your cluster divided by 10. Of course you can use a larger number, but then mappers will run in multiple "waves".

pminovic · ‎09-02-2017

The best way to learn about various pe options is to run "hbase pe" without any options or commands: $ hbase pe Usage: java org.apache.hadoop.hbase.PerformanceEvaluation <OPTIONS> [-D<property=value>]* <command> <nclients> ... About nclients I already replied to you in another question: This is the level of parallelism used to run the specified command, in case of default MapReduce it means that 10*nclinents mappers will be started. About other options you asked, and a few others I use: rows Rows each client runs. Default: One million columns Columns to write per row. Default: 1 presplit Create presplit table. Recommended for accurate perf analysis (see guide). Default: disabled compress Compression type to use (GZ, LZO, ...). Default: 'NONE' table Alternate table name. Default: 'TestTable' bloomFilter Bloom filter type, one of [NONE, ROW, ROWCOL] valueSize Pass value size to use: Default: 1024 Example: hbase pe --table=TestTable2 --compress=GZ --presplit=4 randomWrite 5 And of course, first run one of write commands, followed by some reads. And for the output, look for the following lines in the output of the MR job: HBase Performance Evaluation Elapsed time in milliseconds=492463 Row count=1048560 You can also prepend "time" and run as "time hbase pe ...". For more details search the web, thought the results are segmented.

nicholas_ruggie · ‎08-07-2017

Data is stored in 8K blocks on disk, these make up 128K stripes that are parity protected and striped across nodes and disks. Files smaller than 128K are mirrored instead. This provides a good balance between file size and storage efficiency, since Isilon storage is parity based it gives a better overall storage utilization. HDFS blocks that are 128MB for example are triple mirrored when stored (realize that this is configurable). As an example for a 5 node Isilon cluster (very common) and n+1 protection, a file will be broken up into 4 stripes and one parity stripe (aka 4+1) to be distributed across the cluster. this is an storage overhead of 1/4th or 20% so the effective ondisk storage is 120% for Isilon and 300% for HDFS. FWIW, Isilon uses the HDFS protocol and as such, Isilon uses the HDFS Blocksize parameter to send files across the network, and this value can be tuned to specific workflows. This value should correspond to dfs.blocksize parameter.

Online	Offline
Last Visited	‎07-18-2019 01:02 PM

Member Since	‎07-11-2017 08:04 AM
Last Visited	‎07-18-2019 01:02 PM
Posts	44
Kudos received	1

Cloudera Community

Re: How to move all the regions of a region server...

Re: Ranger UI is opening with HTTPS instead of HTT...

Re: What does nclients option of performance evalu...

Re: How performance evaluation tool pe of hbase wo...

Re: What is the block size while storing the files...