Member since
07-11-2017
44
Posts
1
Kudos Received
0
Solutions
09-24-2019
02:18 AM
HI @hadoopguy Yes there is an impact you will have longer processing time and the operations will be queued. You have to carefully handle the timeout in your jobs. Best, @helmi_khalifa
... View more
06-07-2019
06:11 AM
Hi @Jay Kumar SenSharma, Yes, we have installed Ambari and Ranger on the same node. And we are using HDP 2.6.5 in our cluster. Now I have a clear picture on why we are getting this error. Thank you so much for answering and your detailed explanation.
... View more
09-02-2017
01:49 AM
1 Kudo
Hi @Aishwarya Dixit, pe or "Performance Evaluation" is a tool based on MapReduce to test read/writes to HBase. nclients means that 10*nclients mappers will be started to run the supplied pe command. Example: hbase pe randomWrite 2
...
2017-09-02 01:31:17,681 INFO [main] mapreduce.JobSubmitter: number of splits:20 starts a MR job with 20 mappers and 1 reducer. So you can start with a small number like 1-3 to make sure HBase works as expected, and then increase it to about max number of mappers you can run on your cluster divided by 10. Of course you can use a larger number, but then mappers will run in multiple "waves".
... View more
09-02-2017
02:17 AM
1 Kudo
The best way to learn about various pe options is to run "hbase pe" without any options or commands: $ hbase pe
Usage: java org.apache.hadoop.hbase.PerformanceEvaluation <OPTIONS> [-D<property=value>]* <command> <nclients>
... About nclients I already replied to you in another question: This is the level of parallelism used to run the specified command, in case of default MapReduce it means that 10*nclinents mappers will be started. About other options you asked, and a few others I use: rows Rows each client runs. Default: One million
columns Columns to write per row. Default: 1
presplit Create presplit table. Recommended for accurate perf analysis (see guide). Default: disabled
compress Compression type to use (GZ, LZO, ...). Default: 'NONE'
table Alternate table name. Default: 'TestTable'
bloomFilter Bloom filter type, one of [NONE, ROW, ROWCOL]
valueSize Pass value size to use: Default: 1024
Example: hbase pe --table=TestTable2 --compress=GZ --presplit=4 randomWrite 5 And of course, first run one of write commands, followed by some reads. And for the output, look for the following lines in the output of the MR job: HBase Performance Evaluation
Elapsed time in milliseconds=492463
Row count=1048560 You can also prepend "time" and run as "time hbase pe ...". For more details search the web, thought the results are segmented.
... View more
08-07-2017
06:40 PM
2 Kudos
Data is stored in 8K blocks on disk, these make up 128K stripes that are parity protected and striped across nodes and disks. Files smaller than 128K are mirrored instead. This provides a good balance between file size and storage efficiency, since Isilon storage is parity based it gives a better overall storage utilization. HDFS blocks that are 128MB for example are triple mirrored when stored (realize that this is configurable). As an example for a 5 node Isilon cluster (very common) and n+1 protection, a file will be broken up into 4 stripes and one parity stripe (aka 4+1) to be distributed across the cluster. this is an storage overhead of 1/4th or 20% so the effective ondisk storage is 120% for Isilon and 300% for HDFS. FWIW, Isilon uses the HDFS protocol and as such, Isilon uses the HDFS Blocksize parameter to send files across the network, and this value can be tuned to specific workflows. This value should correspond to dfs.blocksize parameter.
... View more