- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
HFile creation from Hive Table not working
Created ‎08-03-2018 07:34 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I found the following article about how to fill a HBase table with data from Hive: https://community.hortonworks.com/articles/2745/creating-hbase-hfiles-from-an-existing-hive-table.ht...
I also did the steps for me, and it seems to work. The problem is, when I call the following HiveQL
set hfile.family.path=/tmp/test_hbase/cf set hive.hbase.generatehfiles=true INSERT OVERWRITE TABLE testdb.test_hbase SELECT distinct concat_ws("_", name, number, test, step, cast(starttime as STRING)) as k, hashValue, valuelist from testdb.test_orc order by k, hashvalue limit 1000
I need to combine 4 columns to get a unique row key for my HBase table. Another problem is, that my valueList column can contain huge Strings, between 0 and 1 MB.
When I run the query, Tez creates 100 containers for Mapping jobs. This takes a few minutes to complete, which is also slow for 1000 rows, but ok. After the Map step, a Reduce step follows. And this could be the problem in my oppinion, because there's only 1 Reducer for this huge amount of data. This seems to be too less, as the job takes hours now (still not completed yet!)
My questions here:
- What are the Map and Reduce step doing in this scenario?
- Why is there only 1 Reducer?
- Can I somehow change this behavior (e.g. disable Reducing or using more Reducers)?
Thank you!
Created ‎08-03-2018 09:47 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Found following article: http://www.openkb.info/2017/05/hive-on-tez-how-to-control-number-of.html
That helped me. I now set the numbers of Tasks to a fix amount.
Created ‎08-03-2018 07:55 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Last night I was in the search of one of the best portal to which let me play fireboy and watergirl free online games so here it is before you which will also help you to sharp your mind along with great entertainment.
Created ‎08-03-2018 09:47 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Found following article: http://www.openkb.info/2017/05/hive-on-tez-how-to-control-number-of.html
That helped me. I now set the numbers of Tasks to a fix amount.
Created ‎08-03-2018 02:36 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
When you are generating HFiles for HBase, the typical pattern is that you have one reducer per Region because HFiles must only contain data for a specific Region. As such, tweaking the number Reducers you get is more of a factor of presplitting your table to increase the number of Reducers (or merging, to reduce the number of Reducers).
Created ‎08-06-2018 12:40 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@Josh Elser Thank you for that information. I changed my HBase table creation now to following command:
create 'hbase_1m_10r', {NAME => 'cf'}, {SPLITS => ['0', '1', '2', '3', '4', '5', '6', '7', '8', '9']}<br>
When running the following query:
INSERT OVERWRITE TABLE dmueller.hbase_1m_10r SELECT concat_ws(":", cast((hashvalue % 10) as String), concat_ws("_", name, number, test, step, cast(starttime as STRING))) as k, valuelist from (select * from testdb.test_orc limit 1000000) a distribute by split(k, ":")[0] sort by k<br>
I still have only 1 reducer... Any idea why?
Created ‎08-06-2018 03:42 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Does your data actually span all of the regions you created splitpoints for? Or, when this finishes generating the HFile, does the client end up having to split the HFiles (and not just load them?).
The only thing I can guess would be that the HBaseStorageHandler isn't doing something right. Generating only on HFile when you have 10 regions is definitely suboptimal.
