We have HBASE Cluster with one master node and 9 region nodes. We are loading data of 40m emails in one email hbase table with only one row and we are using rowid and with salt. As per HBASE the data is pushed into region node based on salt. We have M/R job which takes rows from hbase base table and process. Since we have only one salt in our rowid, we are worried only one M/R may pull all 30M rows and process instead the data through HDFS block divided all 9 region modes or computing nodes.
What is solution -- Do we need to create 9 different rows with different salt when we ingest data or is there any other solution. Currently we have one row with 40M columns and which Colum is email.
If there is any code create different salt while ingesting data if that is the solution please let us know.
What do you mena by "only one row"? Do you mean on region instead? If so, yes, one region means one mapper task. You can split the table into several regions. You can do that from the table web UI, or hbase shell.