About Spinhoo

Spinhoo · ‎03-05-2015

I figured why the last reducer is taking so long - User error (its me!)... When I presplit the table based on target regions, I missed to include all the keys. This resulted in a table with last key being responsible for 80 times more data than other regions. This is what caused that reducer to spend so much amount of time. If he table is split evenly all reducers seem to be finishing close to each other.

Spinhoo · ‎03-04-2015

It just moved from COPY to SORT phase. So its not hung, but terribly busy. I will try to do the solution you mentioned during the next import. I just hope each reducer does its own copy/sort/reduce for it's region (which they are doing partially) instead of one big long one at the end...

Spinhoo · ‎03-04-2015

Hi, I am upgrading our cluster from CDH3 to 4. As part of this project I created a parallel cluster thats now running CDH4, and now I am importing the Hbase data that I exported and copied on to the new cluster. I am using the bulk load tool to import the data into the tables. Here is how its been done - 1. Exported Hbase tables on CDH3 2. Did distcp to the new cluster 3. Created tables with pre-split regions 4. Importing data using the bulk load tool. Here is the command thats being used - hbase org.apache.hadoop.hbase.mapreduce.Import -Dimport.bulk.output=/backup/TABLE_NAME TABLE_NAME /import/TABLE_NAME The mapping phase of this process goes pretty fast, but reducer takes forever to finish. I did pre-splitting of the regions to increase the number of reducers, but the load still spends a lot of time on the last reducer. Is there anyway that I can improve the speed by letting all the reducers finish close to the sametime. To give the context, a 1.3 TB table has spent 45 min to finish Map phase, and another 1:15 to finish all but one reducer. Now the last reducer still running after nearly 4 hours and only 33% completed. I have more tables to import and they are much larger. Any help would be greatly appreciated. Please let me know if you need more information. Thank you all in advance, Venkat

Online	Offline
Last Visited	‎05-21-2015 01:50 PM

Member Since	‎03-04-2015 01:40 PM
Last Visited	‎05-21-2015 01:50 PM
Posts	4

Cloudera Community

Re: Hbase bulk load help, the last reducer is taki...

Re: Hbase bulk load help, the last reducer is taki...

Re: Hbase bulk load help, the last reducer is taki...

Hbase bulk load help, the last reducer is taking f...