Member since
03-04-2015
4
Posts
0
Kudos Received
1
Solution
My Accepted Solutions
Title | Views | Posted |
---|---|---|
2264 | 03-05-2015 11:14 AM |
03-05-2015
11:14 AM
I figured why the last reducer is taking so long - User error (its me!)... When I presplit the table based on target regions, I missed to include all the keys. This resulted in a table with last key being responsible for 80 times more data than other regions. This is what caused that reducer to spend so much amount of time. If he table is split evenly all reducers seem to be finishing close to each other.
... View more
03-04-2015
02:39 PM
It just moved from COPY to SORT phase. So its not hung, but terribly busy. I will try to do the solution you mentioned during the next import. I just hope each reducer does its own copy/sort/reduce for it's region (which they are doing partially) instead of one big long one at the end...
... View more
03-04-2015
01:56 PM
Hi, I am upgrading our cluster from CDH3 to 4. As part of this project I created a parallel cluster thats now running CDH4, and now I am importing the Hbase data that I exported and copied on to the new cluster. I am using the bulk load tool to import the data into the tables. Here is how its been done - 1. Exported Hbase tables on CDH3 2. Did distcp to the new cluster 3. Created tables with pre-split regions 4. Importing data using the bulk load tool. Here is the command thats being used - hbase org.apache.hadoop.hbase.mapreduce.Import -Dimport.bulk.output=/backup/TABLE_NAME TABLE_NAME /import/TABLE_NAME The mapping phase of this process goes pretty fast, but reducer takes forever to finish. I did pre-splitting of the regions to increase the number of reducers, but the load still spends a lot of time on the last reducer. Is there anyway that I can improve the speed by letting all the reducers finish close to the sametime. To give the context, a 1.3 TB table has spent 45 min to finish Map phase, and another 1:15 to finish all but one reducer. Now the last reducer still running after nearly 4 hours and only 33% completed. I have more tables to import and they are much larger. Any help would be greatly appreciated. Please let me know if you need more information. Thank you all in advance, Venkat
... View more
Labels:
- Labels:
-
Apache Hadoop
-
Apache HBase
-
MapReduce